Re: watchdog timer programming (progress)
- Reply: Stephane Rochoy : "Re: watchdog timer programming (progress)"
- In reply to: mike tancsa : "Re: watchdog timer programming"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 02 Oct 2024 00:40:17 UTC
On 10/1/2024 5:02 PM, mike tancsa wrote: > On 10/1/2024 4:03 PM, mike tancsa wrote: >> On 10/1/2024 2:07 AM, Stephane Rochoy wrote: >>> >>> mike tancsa <mike@sentex.net> writes: >>> >>>> WARNING: This e-mail comes from someone outside your organisation. >>>> Do not click >>>> on links or open attachments if you do not know the sender and are >>>> not sure that >>>> the content is safe. >>>> >>>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >>>>> >>>>> mike tancsa <mike@sentex.net> writes: >>>>> >>>>>> Do you know off hand how to set the system to just reboot ? The >>>>>> ddb man >>>>>> page seems to imply I need options DDB as well, which is not in >>>>>> GENERIC >>>>>> in order to set script actions. >>>>> >>>>> I would try the following: >>>>> >>>>> ddb script kdb.enter.default=reset >>>>> >>>> If I build a custom kernel then that will work. But with GENERIC (I am >>>> tracking project via freebsd-update), it fails >>>> >>>> # ddb script kdb.enter.default=reset >>>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory >>>> >>>> With a customer kernel, adding >>>> >>>> options DDB >>>> >>>> it works perfectly. >>>> >>>> Is there any way to get this to work without having ddb custom >>>> compiled in ? >>> >>> I don't understand what's happening here. AFAIK, the code >>> corresponding to the soft watchdog being triggered is the >>> following: >>> >>> static void >>> wd_timeout_cb(void *arg) >>> { >>> const char *type = arg; >>> >>> #ifdef DDB >>> if ((wd_pretimeout_act & WD_SOFT_DDB)) { >>> char kdb_why[80]; >>> snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout", >>> type); >>> kdb_backtrace(); >>> kdb_enter(KDB_WHY_WATCHDOG, kdb_why); >>> } >>> #endif >>> if ((wd_pretimeout_act & WD_SOFT_LOG)) >>> log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type); >>> if ((wd_pretimeout_act & WD_SOFT_PRINTF)) >>> printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); >>> if ((wd_pretimeout_act & WD_SOFT_PANIC)) >>> panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); >>> } >>> >>> So without DDB, it should call panic. But in your case, it >>> called kdb_backtrace. So initial hypothesis was wrong. What I >>> missed is that panic was natively able to kdb_backtrace if gently >>> asked to do so: >>> >>> #ifdef KDB >>> if ((newpanic || trace_all_panics) && trace_on_panic) >>> kdb_backtrace(); >>> if (debugger_on_panic) >>> kdb_enter(KDB_WHY_PANIC, "panic"); >>> else if (!newpanic && debugger_on_recursive_panic) >>> kdb_enter(KDB_WHY_PANIC, "re-panic"); >>> #endif >>> /*thread_lock(td); */ >>> td->td_flags |= TDF_INPANIC; >>> /* thread_unlock(td); */ >>> if (!sync_on_panic) >>> bootopt |= RB_NOSYNC; >>> if (poweroff_on_panic) >>> bootopt |= RB_POWEROFF; >>> if (powercycle_on_panic) >>> bootopt |= RB_POWERCYCLE; >>> kern_reboot(bootopt); >>> >>> So it definitely should reboot but as it don't, maybe playing with >>> kern.powercycle_on_panic would help? >>> >>> >> >> Thank you for your continued help on this. Still no luck with the >> GENERIC kernel >> >> 0{p9999}# sysctl -w kern.powercycle_on_panic=1 >> kern.powercycle_on_panic: 0 -> 1 >> 0{p9999}# ps -auxwww | grep dog >> root 4752 0.0 0.2 12820 12916 - S<s 15:38 0:00.01 >> watchdogd --softtimeout-action panic -t 10 >> root 4792 0.0 0.0 12808 2644 u0 S+ 15:39 0:00.00 grep dog >> 0{p9999}# kill -9 4752 >> 0{p9999}# KDB: stack backtrace: >> #0 0xffffffff80b7fefd at kdb_backtrace+0x5d >> #1 0xffffffff80abec93 at hardclock+0x103 >> #2 0xffffffff80abfe8b at handleevents+0xab >> #3 0xffffffff80ac0b7c at timercb+0x24c >> #4 0xffffffff810d0ebb at lapic_handle_timer+0xab >> #5 0xffffffff80fd8a71 at Xtimerint+0xb1 >> #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5 >> #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46 >> #8 0xffffffff80fc49ad at cpu_idle+0x9d >> #9 0xffffffff80b67bb6 at sched_idletd+0x576 >> #10 0xffffffff80aecf7f at fork_exit+0x7f >> #11 0xffffffff80fd7dae at fork_trampoline+0xe >> >> 0{p9999}# >> >> Where would be the best place to hack in something like this in the >> driver ? >> sysctl -w debug.kdb.panic_str="Watchdog Panic" >> >> which actually does panic the box >> >> > > One other datapoint. It seems starting > > watchdogd --softtimeout-action panic --softtimeout -t 10 > > After kill -9 > it eventually prints out > > watchdog soft-timeout, WD_SOFT_LOG > > to dmesg. But after that, I cannot start a new watchdogd with just > > watchdogd --softtimeout-action panic -t 10 > > I get > > watchdogd: setting WDIOC_SETSOFT 1: Invalid argument > watchdogd: patting the dog: Invalid argument I made these 2 changes to the driver --- watchdog.c 2024-10-01 20:37:28.667869000 -0400 +++ /tmp/watchdog.c 2024-10-01 20:36:59.764330000 -0400 @@ -61,7 +61,8 @@ static struct callout wd_softtimeo_handle; static int wd_softtimer; /* true = use softtimer instead of hardware watchdog */ -static int wd_softtimeout_act = WD_SOFT_LOG; /* action for the software timeout */ +// static int wd_softtimeout_act = WD_SOFT_LOG; /* action for the software timeout */ +static int wd_softtimeout_act = WD_SOFT_PANIC; /* action for the software timeout */ static struct cdev *wd_dev; static volatile u_int wd_last_u; /* last timeout value set by kern_do_pat */ @@ -241,6 +242,7 @@ wd_timeout_cb(void *arg) { const char *type = arg; + panic("mdt watchdog %s-timeout, WD_SOFT_PANIC set", type); #ifdef DDB if ((wd_pretimeout_act & WD_SOFT_DDB)) { and it works now KDB: stack backtrace: #0 0xffffffff80b8943d at kdb_backtrace+0x5d #1 0xffffffff80b3bfd1 at vpanic+0x131 #2 0xffffffff80b3be93 at panic+0x43 #3 0xffffffff8098b585 at wd_timeout_cb+0x15 #4 0xffffffff80b59fcc at softclock_call_cc+0x12c #5 0xffffffff80b5b815 at softclock_thread+0xe5 #6 0xffffffff80af61df at fork_exit+0x7f #7 0xffffffff80ff76ce at fork_trampoline+0xe Uptime: 1m13s it seems the soft timeout value action is never overridden for some reason. This kinda feels like a bug / pr ? ---Mike