Re: ipmi0: Watchdog set returned 0xc0 (releng_13)
Date: Mon, 16 Aug 2021 14:53:00 UTC
Hi Alexander, Thanks for the reply and info. Yes, you are right. I had the timer set to -t 30, but it actually is printing every 10 seconds. I had a look in the BIOS, and other than the one Watchdog setting in the BIOS Enable or disable to turn on 5-minute watch dog timer. Upon timeout, JWD1 jumper determines system behavior. I dont see any other places to tweak the hardware watchdog. If I enable that, the box does indeed reboot after 5min, even though I have watchdogd running. I am not 100% sure, but on other Supermicro boards this used to work I think I dont have any other RELENG13 boxes on Supermicro boards to test just yet. One other thing I noticed was that if I boot up without ipmi loaded, /dev/fido is there. Does it still see a hardware watchdog somehow, or is that pointing to something else ? If I load the kld 0{r}# kldload ipmi ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0 ipmi0: KCS mode found at io 0xca2 on acpi ipmi0: IPMI device rev. 1, firmware rev. 1.23, version 2.0, device support mask 0xbf ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler <wait 15seconds, console still clear> 0{r}# 0{r}# watchdogd ipmi0: Watchdog set returned 0xc0 ipmi0: Watchdog set returned 0xc0 ipmi0: Watchdog set returned 0xc0 ipmi0: Watchdog set returned 0xc0 0{r}# ipmi0: Watchdog set returned 0xc0 ipmi0: Watchdog set returned 0xc0 I am going to look around for a BIOS update to see if there is some fix ---Mike On 8/16/2021 10:09 AM, Alexander Motin wrote: > Hi Mike, > > According to IPMI specification 0xc0 means: "Node Busy. Command could > not be processed because command processing resources are temporarily > unavailable." I have no idea what it means for the driver, but I > suspect that you always have it inside, just before the mentioned commit > it was quietly ignored. I can't propose much other that hide it again > if errors like that get too widespread. I haven't seen errors like that > on X11DPI-NT boards I've tested this. I saw 0xc9 if I set watchdog > timeout below about a minute, for which I have no explanation either, > but you may try to experiment with the different timeouts or pat > intervals. The errors period of 30s seems interesting, considering > default pat period in watchdogd of 10s. > > On 16.08.2021 09:26, mike tancsa wrote: >> Hi All, >> >> I updated a box from about a month ago, and noticed that the console >> is full of >> >> ipmi0: Watchdog set returned 0xc0 >> >> It fires every 30 seconds which is what I have the timer set to. It >> seems to be related to the ipmi watchdog as another box I have which >> uses ichwd doesnt spew a similar message. >> >> The only commit seems to be >> >> commit b41b86b65f10ccaa8cce8cc11a030ad464b654c0 >> Author: Alexander Motin <mav@FreeBSD.org> >> Date: Thu Jul 29 23:39:04 2021 -0400 >> >> Board is a Super Micro X11SCH-F. Bios 1.5 from 11/17/2020 >> >> My kernel is not that different from GENERIC. If I do a killall -9 >> watchdogd it reboots as expected. >> >> >>> device cxgbe >>> device cryptodev >>> options TCP_SIGNATURE >>> options IPSEC >>> options IPFIREWALL #firewall >>> options IPFIREWALL_VERBOSE #enable logging to syslogd(8) >>> options IPFIREWALL_VERBOSE_LIMIT=9100 #limit verbosity >>> options IPFIREWALL_DEFAULT_TO_ACCEPT #allow everything by >> default >>> #options ROUTETABLES=2 >>> option FIB_ALGO >> sysctl.conf is >> >> vfs.zfs.min_auto_ashift=12 >> net.inet.ip.redirect=0 >> net.inet6.ip6.redirect=0 >> kern.ipc.maxsockbuf=16777216 >> net.inet.tcp.blackhole=1 >> >> and loader.conf >> >> zfs_load="YES" >> comconsole_speed="115200" # Set the current serial console speed >> boot_multicons="YES" >> boot_serial="YES" >> console="efi" >> ipmi_load="YES" >> cpu_microcode_load="YES" >> cpu_microcode_name="/boot/firmware/intel-ucode.bin" >> comconsole_port="0x2f8" >> >> if_disc_load="YES" >> >> hw.cxgbe.toecaps_allowed="0" >> hw.cxgbe.rdmacaps_allowed="0" >> hw.cxgbe.iscsicaps_allowed="0" >> hw.cxgbe.fcoecaps_allowed="0" >> hw.cxgbe.pause_settings="0" >> hw.cxgbe.attack_filter="1" >> hw.cxgbe.drop_pkts_with_l3_errors="1" >> >> vm.pmap.pti=0 >> >> net.inet.ip.fw.default_to_accept=1 >> >> >> contigmem_load="YES" >> nic_uio_load="YES" >> #hw.nic_uio.bdfs="2:0:0,2:0:1" >> hw.nic_uio.bdfs="2:0:0,2:0:1,2:0:2,2:0:3" >> >> hw.contigmem.num_buffers=2 >> hw.contigmem.buffer_size=1073741824 >> >> dpdk_lpm4_load="YES" >> dpdk_lpm6_load="YES"