making SW_WATCHDOG dynamic

Andriy Gapon avg at FreeBSD.org
Wed Dec 27 13:46:23 UTC 2017


On 26/12/2017 16:25, Mike Karels wrote:
> There is a kernel option, SW_WATCHDOG, which adds a low-level software
> watchdog in hardclock.  By default, the kernel and watchdogd support
> only hardware-based watchdogs.  There is also a callout-based software
> watchdog that can be enabled by watchdogd with an ioctl if --softwatchdog
> is specified, but watchdogd doesn't switch on its own.  The SW_WATCHDOG
> option adds a lower-level software watchdog to the hardware-based mechanism,
> but it adds it unconditionally.  I propose to include the SW_WATCHDOG
> facility by default, but enable it only if there is no hardware watchdog.

I think that this is a good idea.  Although, I would not necessarily tie the
software watchdog to not having any hardware watchdog.  This is probably a good
default policy, but I would allow to enable / disable the software watchdog
explicitly (e.g. via a sysctl).

I also think that we should support enabling several watchdog timers with
different timeouts.  Each of them can serve a different purpose.  E.g., a
software or hardware NMI-sending watchdog can be used to get diagnostic data out
of a hung system while a resetting watchdog can be used to ensure fail-safe
operation.

> I'm interested in any comments, suggestions, or background; feel free to
> mail me off the list.  If there are multiple people interested, I'll
> forward messages to that group.
> 
> I want to make the change because I have found SW_WATCHDOG quite useful
> at $JOB, and it's annoying to have to build a custom kernel just for this
> (not just once, but every time there is a kernel patch).

Makes sense.

> Also, I'm curious why we have two software watchdog facilities.  The
> --softwatchdog facility has various options on expiration, such as
> printf/log/panic; I don't know why anything other than panic/reboot
> would be desirable, though.  I already contacted some of the people who
> have left fingerprints on watchdog.  Also, if anyone wants to review
> the code, let me know.

I guess that the second software watchdog was added to achieve what I suggested
above.  Of course, it would have been nicer to re-use SW_WATCHDOG for that
purpose and to add a more generic support for configuring multiple watchdog
timers with different timeouts.  But I guess that adding a new single-purpose
software watchdog was much easier to do.

P.S.
And maybe just using the second software watchdog would be good enough for what
you are doing?

-- 
Andriy Gapon


More information about the freebsd-arch mailing list