From nobody Tue Oct 01 21:02:07 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XJ9Nx6vNCz5XNbL for ; Tue, 01 Oct 2024 21:02:09 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XJ9Nx0WGLz4qHv for ; Tue, 1 Oct 2024 21:02:09 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:1::12 as permitted sender) smtp.mailfrom=mike@sentex.net; dmarc=none Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.18.1/8.18.1) with ESMTPS id 491L28Xp065694 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL); Tue, 1 Oct 2024 17:02:08 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:eca3:ea83:d867:1a0] ([IPv6:2607:f3e0:0:4:eca3:ea83:d867:1a0]) by pyroxene2a.sentex.ca (8.18.1/8.15.2) with ESMTPS id 491L26ka065737 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 1 Oct 2024 17:02:07 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <1b346afb-d6ed-4f00-8dcf-5cdd389d210b@sentex.net> Date: Tue, 1 Oct 2024 17:02:07 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: watchdog timer programming From: mike tancsa To: Chris6 via freebsd-hardware References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> Content-Language: en-US Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.86 X-Spamd-Result: default: False [-3.39 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.995]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[199.212.134.19:received]; XM_UA_NO_VERSION(0.01)[]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; FREEFALL_USER(0.00)[mike]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hardware@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_DN_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[sentex.net]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4XJ9Nx0WGLz4qHv X-Spamd-Bar: --- On 10/1/2024 4:03 PM, mike tancsa wrote: > On 10/1/2024 2:07 AM, Stephane Rochoy wrote: >> >> mike tancsa writes: >> >>> WARNING: This e-mail comes from someone outside your organisation. >>> Do not click >>> on links or open attachments if you do not know the sender and are >>> not sure that >>> the content is safe. >>> >>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >>>> >>>> mike tancsa writes: >>>> >>>>> Do you know off hand how to set the system to just reboot ? The >>>>> ddb man >>>>> page seems to imply I need options DDB as well, which is not in >>>>> GENERIC >>>>> in order to set script actions. >>>> >>>> I would try the following: >>>> >>>>  ddb script kdb.enter.default=reset >>>> >>> If I build a custom kernel then that will work. But with GENERIC (I am >>> tracking project via freebsd-update), it fails >>> >>> # ddb script kdb.enter.default=reset >>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory >>> >>> With a customer kernel, adding >>> >>> options DDB >>> >>> it works perfectly. >>> >>> Is there any way to get this to work without having ddb custom >>> compiled in ? >> >> I don't understand what's happening here. AFAIK, the code >> corresponding to the soft watchdog being triggered is the >> following: >> >>  static void >>  wd_timeout_cb(void *arg) >>  { >>    const char *type = arg; >> >>  #ifdef DDB >>    if ((wd_pretimeout_act & WD_SOFT_DDB)) { >>      char kdb_why[80]; >>      snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",      >> type); >>      kdb_backtrace(); >>      kdb_enter(KDB_WHY_WATCHDOG, kdb_why); >>    } >>  #endif >>    if ((wd_pretimeout_act & WD_SOFT_LOG)) >>      log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type); >>    if ((wd_pretimeout_act & WD_SOFT_PRINTF)) >>      printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); >>    if ((wd_pretimeout_act & WD_SOFT_PANIC)) >>      panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); >>  } >> >> So without DDB, it should call panic. But in your case, it >> called kdb_backtrace. So initial hypothesis was wrong. What I >> missed is that panic was natively able to kdb_backtrace if gently >> asked to do so: >> >>  #ifdef KDB >>    if ((newpanic || trace_all_panics) && trace_on_panic) >>      kdb_backtrace(); >>    if (debugger_on_panic) >>      kdb_enter(KDB_WHY_PANIC, "panic"); >>    else if (!newpanic && debugger_on_recursive_panic) >>      kdb_enter(KDB_WHY_PANIC, "re-panic"); >>  #endif >>    /*thread_lock(td); */ >>    td->td_flags |= TDF_INPANIC; >>    /* thread_unlock(td); */ >>    if (!sync_on_panic) >>      bootopt |= RB_NOSYNC; >>    if (poweroff_on_panic) >>      bootopt |= RB_POWEROFF; >>    if (powercycle_on_panic) >>      bootopt |= RB_POWERCYCLE; >>    kern_reboot(bootopt); >> >> So it definitely should reboot but as it don't, maybe playing with >> kern.powercycle_on_panic would help? >> >> > > Thank you for your continued help on this. Still no luck with the > GENERIC kernel > > 0{p9999}# sysctl -w kern.powercycle_on_panic=1 > kern.powercycle_on_panic: 0 -> 1 > 0{p9999}# ps -auxwww | grep dog > root     4752   0.0  0.2   12820  12916  -  S watchdogd --softtimeout-action panic -t 10 > root     4792   0.0  0.0   12808   2644 u0  S+   15:39     0:00.00 > grep dog > 0{p9999}# kill -9 4752 > 0{p9999}# KDB: stack backtrace: > #0 0xffffffff80b7fefd at kdb_backtrace+0x5d > #1 0xffffffff80abec93 at hardclock+0x103 > #2 0xffffffff80abfe8b at handleevents+0xab > #3 0xffffffff80ac0b7c at timercb+0x24c > #4 0xffffffff810d0ebb at lapic_handle_timer+0xab > #5 0xffffffff80fd8a71 at Xtimerint+0xb1 > #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5 > #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46 > #8 0xffffffff80fc49ad at cpu_idle+0x9d > #9 0xffffffff80b67bb6 at sched_idletd+0x576 > #10 0xffffffff80aecf7f at fork_exit+0x7f > #11 0xffffffff80fd7dae at fork_trampoline+0xe > > 0{p9999}# > > Where would be the best place to hack in something like this in the > driver ? >  sysctl -w debug.kdb.panic_str="Watchdog Panic" > > which actually does panic the box > > One other datapoint. It seems starting watchdogd --softtimeout-action panic --softtimeout -t 10 After kill -9 it eventually prints out watchdog soft-timeout, WD_SOFT_LOG to dmesg.  But after that, I cannot start a new watchdogd with just watchdogd --softtimeout-action panic -t 10 I get watchdogd: setting WDIOC_SETSOFT 1: Invalid argument watchdogd: patting the dog: Invalid argument