Re: watchdog timer programming

From: mike tancsa <mike_at_sentex.net>
Date: Tue, 01 Oct 2024 20:03:23 UTC
On 10/1/2024 2:07 AM, Stephane Rochoy wrote:
>
> mike tancsa <mike@sentex.net> writes:
>
>> WARNING: This e-mail comes from someone outside your organisation. Do 
>> not click
>> on links or open attachments if you do not know the sender and are 
>> not sure that
>> the content is safe.
>>
>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote:
>>>
>>> mike tancsa <mike@sentex.net> writes:
>>>
>>>> Do you know off hand how to set the system to just reboot ? The ddb 
>>>> man
>>>> page seems to imply I need options DDB as well, which is not in 
>>>> GENERIC
>>>> in order to set script actions.
>>>
>>> I would try the following:
>>>
>>>  ddb script kdb.enter.default=reset
>>>
>> If I build a custom kernel then that will work. But with GENERIC (I am
>> tracking project via freebsd-update), it fails
>>
>> # ddb script kdb.enter.default=reset
>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory
>>
>> With a customer kernel, adding
>>
>> options DDB
>>
>> it works perfectly.
>>
>> Is there any way to get this to work without having ddb custom
>> compiled in ?
>
> I don't understand what's happening here. AFAIK, the code
> corresponding to the soft watchdog being triggered is the
> following:
>
>  static void
>  wd_timeout_cb(void *arg)
>  {
>    const char *type = arg;
>
>  #ifdef DDB
>    if ((wd_pretimeout_act & WD_SOFT_DDB)) {
>      char kdb_why[80];
>      snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",      
> type);
>      kdb_backtrace();
>      kdb_enter(KDB_WHY_WATCHDOG, kdb_why);
>    }
>  #endif
>    if ((wd_pretimeout_act & WD_SOFT_LOG))
>      log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type);
>    if ((wd_pretimeout_act & WD_SOFT_PRINTF))
>      printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type);
>    if ((wd_pretimeout_act & WD_SOFT_PANIC))
>      panic("watchdog %s-timeout, WD_SOFT_PANIC set", type);
>  }
>
> So without DDB, it should call panic. But in your case, it
> called kdb_backtrace. So initial hypothesis was wrong. What I
> missed is that panic was natively able to kdb_backtrace if gently
> asked to do so:
>
>  #ifdef KDB
>    if ((newpanic || trace_all_panics) && trace_on_panic)
>      kdb_backtrace();
>    if (debugger_on_panic)
>      kdb_enter(KDB_WHY_PANIC, "panic");
>    else if (!newpanic && debugger_on_recursive_panic)
>      kdb_enter(KDB_WHY_PANIC, "re-panic");
>  #endif
>    /*thread_lock(td); */
>    td->td_flags |= TDF_INPANIC;
>    /* thread_unlock(td); */
>    if (!sync_on_panic)
>      bootopt |= RB_NOSYNC;
>    if (poweroff_on_panic)
>      bootopt |= RB_POWEROFF;
>    if (powercycle_on_panic)
>      bootopt |= RB_POWERCYCLE;
>    kern_reboot(bootopt);
>
> So it definitely should reboot but as it don't, maybe playing with
> kern.powercycle_on_panic would help?
>
>

Thank you for your continued help on this. Still no luck with the 
GENERIC kernel

0{p9999}# sysctl -w kern.powercycle_on_panic=1
kern.powercycle_on_panic: 0 -> 1
0{p9999}# ps -auxwww | grep dog
root     4752   0.0  0.2   12820  12916  -  S<s  15:38 0:00.01 watchdogd 
--softtimeout-action panic -t 10
root     4792   0.0  0.0   12808   2644 u0  S+   15:39     0:00.00 grep dog
0{p9999}# kill -9 4752
0{p9999}# KDB: stack backtrace:
#0 0xffffffff80b7fefd at kdb_backtrace+0x5d
#1 0xffffffff80abec93 at hardclock+0x103
#2 0xffffffff80abfe8b at handleevents+0xab
#3 0xffffffff80ac0b7c at timercb+0x24c
#4 0xffffffff810d0ebb at lapic_handle_timer+0xab
#5 0xffffffff80fd8a71 at Xtimerint+0xb1
#6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5
#7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46
#8 0xffffffff80fc49ad at cpu_idle+0x9d
#9 0xffffffff80b67bb6 at sched_idletd+0x576
#10 0xffffffff80aecf7f at fork_exit+0x7f
#11 0xffffffff80fd7dae at fork_trampoline+0xe

0{p9999}#

Where would be the best place to hack in something like this in the driver ?
  sysctl -w debug.kdb.panic_str="Watchdog Panic"

which actually does panic the box

     ---Mike