SW_WATCHDOG vs new eventtimer code
Alexander Motin
mav at FreeBSD.org
Wed Sep 21 08:41:53 UTC 2011
Andriy Gapon wrote:
> on 20/09/2011 23:04 Alexander Motin said the following:
>> On 20.09.2011 22:19, Andriy Gapon wrote:
>>> just want to check with you first if the following makes sense.
>>> I use SW_WATCHDOG on one of the test machines, which was recently updated to
>>> from stable/8 to head. Now it seems to get seemingly random watchdog events.
>>> My theory is that this is because of the eventtimer logic.
>>> If during idle period we accumulate enough timer ticks and then run all those
>>> ticks very rapidly, then the SW_WATCHDOG code may get an impression that it was
>>> not patted for many real ticks.
>>> Not sure what would be the best way to make SW_WATCHDOG happier/smarter.
>> Eventtimer code now set to generate interrupts at least 4 times per
>> second for each CPU. As soon as SW_WATCHDOG only handles periods more
>> then one second, I would say it should not be hurt. I would try to add
>> some debug there to see what's going on (how big the tick busts are).
>> I'll try it to do it tomorrow.
I've built kernel with SW_WATCHDOG and run watchdogd with most tight
parameters (-s 1 -t 2), but observed no problems so far.
> Just in case, here is a debugging snippet from a panic that I've got:
> #14 0xffffffff80660ae5 in handleevents (now=0xffffff80e3e0b8b0, fake=0) at
> /usr/src/sys/kern/kern_clocksource.c:209
> 209 while (bintime_cmp(now, &state->nextstat, >=)) {
> (kgdb) list
> 204 }
> 205 if (runs && fake < 2) {
> 206 hardclock_anycpu(runs, usermode);
> 207 done = 1;
> 208 }
> 209 while (bintime_cmp(now, &state->nextstat, >=)) {
> 210 if (fake < 2)
> 211 statclock(usermode);
> 212 bintime_add(&state->nextstat, &statperiod);
> 213 done = 1;
> (kgdb) p state->nextstat
> $1 = {sec = 90, frac = 15986939599958264124}
> (kgdb) p *now
> $3 = {sec = 106, frac = 11494276814354478452}
> (kgdb) p statperiod
> $4 = {sec = 0, frac = 145249953336295682}
>
> (kgdb) fr 13
> #13 0xffffffff8042603e in hardclock_anycpu (cnt=15761, usermode=Variable
> "usermode" is not available.
> ) at atomic.h:183
> 183 atomic.h: No such file or directory.
> in atomic.h
> (kgdb) p cnt
> $5 = 15761
> (kgdb) p newticks
> $6 = 15000
> (kgdb) p watchdog_ticks
> $7 = 16000
>
> Watchdog timeout was set to ~16 seconds.
It looks like your system was out for about 15 seconds or for some
reason system uptime jumped 15 seconds forward. Have you done anything
special at the moment or have you seen anything strange in system
behavior? What timecounter are you using? I see you are using HPET
eventtimer, but on what hardware (is it per-CPU or global)?
Building kernel with KTR_SPARE2 ktrace enabled should help to collect
valuable info about timers behavior before the crash.
--
Alexander Motin
More information about the freebsd-hackers
mailing list