Re: watchdog timer programming
- Reply: mike tancsa : "Re: watchdog timer programming (progress)"
- In reply to: mike tancsa : "Re: watchdog timer programming"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 01 Oct 2024 21:02:07 UTC
On 10/1/2024 4:03 PM, mike tancsa wrote: > On 10/1/2024 2:07 AM, Stephane Rochoy wrote: >> >> mike tancsa <mike@sentex.net> writes: >> >>> WARNING: This e-mail comes from someone outside your organisation. >>> Do not click >>> on links or open attachments if you do not know the sender and are >>> not sure that >>> the content is safe. >>> >>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >>>> >>>> mike tancsa <mike@sentex.net> writes: >>>> >>>>> Do you know off hand how to set the system to just reboot ? The >>>>> ddb man >>>>> page seems to imply I need options DDB as well, which is not in >>>>> GENERIC >>>>> in order to set script actions. >>>> >>>> I would try the following: >>>> >>>> ddb script kdb.enter.default=reset >>>> >>> If I build a custom kernel then that will work. But with GENERIC (I am >>> tracking project via freebsd-update), it fails >>> >>> # ddb script kdb.enter.default=reset >>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory >>> >>> With a customer kernel, adding >>> >>> options DDB >>> >>> it works perfectly. >>> >>> Is there any way to get this to work without having ddb custom >>> compiled in ? >> >> I don't understand what's happening here. AFAIK, the code >> corresponding to the soft watchdog being triggered is the >> following: >> >> static void >> wd_timeout_cb(void *arg) >> { >> const char *type = arg; >> >> #ifdef DDB >> if ((wd_pretimeout_act & WD_SOFT_DDB)) { >> char kdb_why[80]; >> snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout", >> type); >> kdb_backtrace(); >> kdb_enter(KDB_WHY_WATCHDOG, kdb_why); >> } >> #endif >> if ((wd_pretimeout_act & WD_SOFT_LOG)) >> log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type); >> if ((wd_pretimeout_act & WD_SOFT_PRINTF)) >> printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); >> if ((wd_pretimeout_act & WD_SOFT_PANIC)) >> panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); >> } >> >> So without DDB, it should call panic. But in your case, it >> called kdb_backtrace. So initial hypothesis was wrong. What I >> missed is that panic was natively able to kdb_backtrace if gently >> asked to do so: >> >> #ifdef KDB >> if ((newpanic || trace_all_panics) && trace_on_panic) >> kdb_backtrace(); >> if (debugger_on_panic) >> kdb_enter(KDB_WHY_PANIC, "panic"); >> else if (!newpanic && debugger_on_recursive_panic) >> kdb_enter(KDB_WHY_PANIC, "re-panic"); >> #endif >> /*thread_lock(td); */ >> td->td_flags |= TDF_INPANIC; >> /* thread_unlock(td); */ >> if (!sync_on_panic) >> bootopt |= RB_NOSYNC; >> if (poweroff_on_panic) >> bootopt |= RB_POWEROFF; >> if (powercycle_on_panic) >> bootopt |= RB_POWERCYCLE; >> kern_reboot(bootopt); >> >> So it definitely should reboot but as it don't, maybe playing with >> kern.powercycle_on_panic would help? >> >> > > Thank you for your continued help on this. Still no luck with the > GENERIC kernel > > 0{p9999}# sysctl -w kern.powercycle_on_panic=1 > kern.powercycle_on_panic: 0 -> 1 > 0{p9999}# ps -auxwww | grep dog > root 4752 0.0 0.2 12820 12916 - S<s 15:38 0:00.01 > watchdogd --softtimeout-action panic -t 10 > root 4792 0.0 0.0 12808 2644 u0 S+ 15:39 0:00.00 > grep dog > 0{p9999}# kill -9 4752 > 0{p9999}# KDB: stack backtrace: > #0 0xffffffff80b7fefd at kdb_backtrace+0x5d > #1 0xffffffff80abec93 at hardclock+0x103 > #2 0xffffffff80abfe8b at handleevents+0xab > #3 0xffffffff80ac0b7c at timercb+0x24c > #4 0xffffffff810d0ebb at lapic_handle_timer+0xab > #5 0xffffffff80fd8a71 at Xtimerint+0xb1 > #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5 > #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46 > #8 0xffffffff80fc49ad at cpu_idle+0x9d > #9 0xffffffff80b67bb6 at sched_idletd+0x576 > #10 0xffffffff80aecf7f at fork_exit+0x7f > #11 0xffffffff80fd7dae at fork_trampoline+0xe > > 0{p9999}# > > Where would be the best place to hack in something like this in the > driver ? > sysctl -w debug.kdb.panic_str="Watchdog Panic" > > which actually does panic the box > > One other datapoint. It seems starting watchdogd --softtimeout-action panic --softtimeout -t 10 After kill -9 it eventually prints out watchdog soft-timeout, WD_SOFT_LOG to dmesg. But after that, I cannot start a new watchdogd with just watchdogd --softtimeout-action panic -t 10 I get watchdogd: setting WDIOC_SETSOFT 1: Invalid argument watchdogd: patting the dog: Invalid argument