debugging OS hangs

From: Tom Everett <tom_at_khubla.com>
Date: Sat, 02 Nov 2024 18:02:58 UTC
I have a FreeBSD host running 14.1 which has an uptime of maybe a day or 
two between hangs.  When it hangs there is no console output indicating 
a problem and it no longer responds to ping.  Keyboard input is ignored.

I have attached the output of dmesg below.

I ran a memory test; the RAM chips seem to be fine.  I’ve replace the 
root disk too, and put in a new network card in the hope that the 
problem was a nic driver.  The hangs still happen.

Are there any switches I can set to get some clues as to what is 
happening?


---<<BOOT>>---
Copyright (c) 1992-2023 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
     The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.1-RELEASE-p5 GENERIC amd64
FreeBSD clang version 18.1.5 (https://github.com/llvm/llvm-project.git 
llvmorg-18.1.5-0-g617a15a9eac9)
VT(efifb): resolution 1024x768
CPU: AMD Ryzen 9 3900X 12-Core Processor             (3793.05-MHz 
K8-class CPU)
   Origin="AuthenticAMD"  Id=0x870f10  Family=0x17  Model=0x71  
Stepping=0
   
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
   
Features2=0x7ef8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
   AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
   AMD 
Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,ADMSKX>
   Structured Extended 
Features=0x219c91a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA>
   Structured Extended Features2=0x400004<UMIP,RDPID>
   XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
   AMD Extended Feature Extensions ID 
EBX=0x108b657<CLZERO,IRPerf,XSaveErPtr,RDPRU,WBNOINVD,IBPB,STIBP,SSBD>
   SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
   TSC: P-state invariant, performance statistics
real memory  = 68717379584 (65534 MB)
avail memory = 66761863168 (63669 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I >
FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
FreeBSD/SMP: 1 package(s) x 4 cache groups x 3 core(s) x 2 hardware 
threads
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 2.1> irqs 0-23
ioapic1 <Version 2.1> irqs 24-55
Launching APs: 16 17 15 13 8 6 1 7 10 2 18 5 11 20 22 4 3 21 23 19 14 9 
12
random: entropy device external interface
kbd1 at kbdmux0
efirtc0: <EFI Realtime Clock>
efirtc0: registered as a time-of-day clock, resolution 1.000000s
smbios0: <System Management BIOS> at iomem 0xbd9f1000-0xbd9f101e
smbios0: Version: 3.3, BCD Revision: 3.3
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256>
acpi0: <ALASKA A M I >
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 
on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 350
Event timer "HPET2" frequency 14318180 Hz quality 350
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <base peripheral, IOMMU> at device 0.2 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> at device 1.1 on pci0
pci1: <ACPI PCI bus> on pcib1
nvme0: <Generic NVMe Device> mem 0xfcf00000-0xfcf03fff irq 24 at device 
0.0 on pci1
pcib2: <ACPI PCI-PCI bridge> at device 1.2 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 28 at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci3
pci4: <ACPI PCI bus> on pcib4
pci4: <network> at device 0.0 (no driver attached)
pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci3
pci5: <ACPI PCI bus> on pcib5
igb0: <Intel(R) I211 (Copper)> port 0xe000-0xe01f mem 
0xfc900000-0xfc91ffff,0xfc920000-0xfc923fff irq 28 at device 0.0 on pci5
igb0: NVM V0.6 imgtype1
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 18:c0:4d:89:1c:0f
igb0: netmap queues/slots: TX 2/1024, RX 2/1024
pcib6: <ACPI PCI-PCI bridge> at device 5.0 on pci3
pci6: <ACPI PCI bus> on pcib6
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 
0xd000-0xd0ff mem 0xfc804000-0xfc804fff,0xfc800000-0xfc803fff irq 29 at 
device 0.0 on pci6
re0: Using 1 MSI-X message
re0: Chip rev. 0x54000000
re0: MAC rev. 0x00100000
miibus0: <MII bus> on re0
rgephy0: <RTL8251/8153 1000BASE-T media interface> PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 
100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 
1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Using defaults for TSO: 65518/35/2048
re0: Ethernet address: 00:e0:4c:69:2f:72
re0: netmap queues/slots: TX 1/256, RX 1/256
pcib7: <ACPI PCI-PCI bridge> at device 6.0 on pci3
pci7: <ACPI PCI bus> on pcib7
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xfc700000-0xfc700fff irq 
30 at device 0.0 on pci7
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
pcib8: <ACPI PCI-PCI bridge> irq 28 at device 8.0 on pci3
pci8: <ACPI PCI bus> on pcib8
xhci1: <AMD Matisse USB 3.0 controller> mem 0xfc400000-0xfc4fffff irq 28 
at device 0.1 on pci8
xhci1: 64 bytes context size, 64-bit DMA
usbus1 on xhci1
usbus1: 5.0Gbps Super Speed USB v3.0
xhci2: <AMD Matisse USB 3.0 controller> mem 0xfc300000-0xfc3fffff irq 30 
at device 0.3 on pci8
xhci2: 64 bytes context size, 64-bit DMA
usbus2 on xhci2
usbus2: 5.0Gbps Super Speed USB v3.0
pcib9: <PCI-PCI bridge> irq 29 at device 9.0 on pci3
pci9: <PCI bus> on pcib9
ahci0: <AMD KERNCZ AHCI SATA controller> mem 0xfc600000-0xfc6007ff irq 
29 at device 0.0 on pci9
ahci0: AHCI v1.31 with 2 6Gbps ports, Port Multiplier supported with FBS
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
pcib10: <PCI-PCI bridge> irq 30 at device 10.0 on pci3
pci10: <PCI bus> on pcib10
ahci1: <AMD KERNCZ AHCI SATA controller> mem 0xfc500000-0xfc5007ff irq 
30 at device 0.0 on pci10
ahci1: AHCI v1.31 with 4 6Gbps ports, Port Multiplier supported with FBS
ahcich4: <AHCI channel> at channel 0 on ahci1
ahcich5: <AHCI channel> at channel 1 on ahci1
ahcich8: <AHCI channel> at channel 4 on ahci1
ahcich9: <AHCI channel> at channel 5 on ahci1
pcib11: <ACPI PCI-PCI bridge> at device 3.1 on pci0
pci11: <ACPI PCI bus> on pcib11
vgapci0: <VGA-compatible display> port 0xf000-0xf0ff mem 
0xd0000000-0xdfffffff,0xe0000000-0xe01fffff,0xfce00000-0xfce3ffff irq 54 
at device 0.0 on pci11
vgapci0: Boot video device
hdac0: <ATI (0xaaf0) HDA Controller> mem 0xfce60000-0xfce63fff irq 55 at 
device 0.1 on pci11
pcib12: <ACPI PCI-PCI bridge> at device 7.1 on pci0
pci12: <ACPI PCI bus> on pcib12
pcib13: <ACPI PCI-PCI bridge> at device 8.1 on pci0
pci13: <ACPI PCI bus> on pcib13
pci13: <encrypt/decrypt> at device 0.1 (no driver attached)
xhci3: <AMD Matisse USB 3.0 controller> mem 0xfcb00000-0xfcbfffff irq 39 
at device 0.3 on pci13
xhci3: 64 bytes context size, 64-bit DMA
usbus3 on xhci3
usbus3: 5.0Gbps Super Speed USB v3.0
hdac1: <AMD X570 HDA Controller> mem 0xfcd00000-0xfcd07fff irq 36 at 
device 0.4 on pci13
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
acpi_button0: <Power Button> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
acpi_tz1: <Thermal Zone> on acpi0
acpi_tz2: <Thermal Zone> on acpi0
orm0: <ISA Option ROM> at iomem 0xc0000-0xce7ff pnpid ORM0000 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbdc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 15.
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
Timecounter "TSC-low" frequency 1896437301 Hz quality 1000
Timecounters tick every 1.000 msec
hdacc0: <ATI R6xx HDA CODEC> at cad 0 on hdac0
hdaa0: <ATI R6xx Audio Function Group> at nid 1 on hdacc0
pcm0: <ATI R6xx (HDMI)> at nid 3 on hdaa0
pcm1: <ATI R6xx (HDMI)> at nid 5 on hdaa0
pcm2: <ATI R6xx (HDMI)> at nid 7 on hdaa0
pcm3: <ATI R6xx (HDMI)> at nid 9 on hdaa0
pcm4: <ATI R6xx (HDMI)> at nid 11 on hdaa0
pcm5: <ATI R6xx (HDMI)> at nid 13 on hdaa0
hdacc1: <Realtek ALCS1200A HDA CODEC> at cad 0 on hdac1
hdaa1: <Realtek ALCS1200A Audio Function Group> at nid 1 on hdacc1
pcm6: <Realtek ALCS1200A (Rear Analog 5.1/2.0)> at nid 20,22,21 and 
24,26 on hdaa1
pcm7: <Realtek ALCS1200A (Front Analog)> at nid 27 and 25 on hdaa1
pcm8: <Realtek ALCS1200A (Rear Digital)> at nid 30 on hdaa1
ugen3.1: <AMD XHCI root HUB> at usbus3
Trying to mount root from ufs:/dev/nda0p2 [rw]...
ugen1.1: <AMD XHCI root HUB> at usbus1
uhub0 on usbus1
uhub0: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
ugen2.1: <AMD XHCI root HUB> at usbus2
uhub1 on usbus3
uhub1: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus3
uhub2 on usbus2
uhub2: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus2
ugen0.1: <(0x1106) XHCI root HUB> at usbus0
uhub3 on usbus0
uhub3: <(0x1106) XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on 
usbus0
uhub3: 5 ports with 4 removable, self powered
nda0 at nvme0 bus 0 scbus6 target 0 lun 1
nda0: <WD_BLACK SN850X 1000GB 620331WD 233204800587>
nda0: Serial Number 233204800587
nda0: nvme version 1.4
nda0: 953869MB (1953525168 512 byte sectors)
ada0 at ahcich4 bus 0 scbus2 target 0 lun 0
ada0: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
ada0: Serial Number WD-WX12D50LH5YY
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 5723166MB (11721045168 512 byte sectors)
ada0: quirks=0x1<4K>
ada1 at ahcich5 bus 0 scbus3 target 0 lun 0
ada1: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
ada1: Serial Number WD-WX32D50E53LJ
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 5723166MB (11721045168 512 byte sectors)
ada1: quirks=0x1<4K>
ada2 at ahcich8 bus 0 scbus4 target 0 lun 0
ada2: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
ada2: Serial Number WD-WX72D507J792
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 5723166MB (11721045168 512 byte sectors)
ada2: quirks=0x1<4K>
ada3 at ahcich9 bus 0 scbus5 target 0 lun 0
ada3: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
ada3: Serial Number WD-WX72D507J8YK
ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 5723166MB (11721045168 512 byte sectors)
ada3: quirks=0x1<4K>
uhub1: 8 ports with 8 removable, self powered
uhub0: 10 ports with 10 removable, self powered
uhub2: 10 ports with 10 removable, self powered
ugen0.2: <vendor 0x2109 USB2.0 Hub> at usbus0
uhub4 on uhub3
uhub4: <vendor 0x2109 USB2.0 Hub, class 9/0, rev 2.10/4.20, addr 1> on 
usbus0
ugen1.2: <vendor 0x05e3 USB2.0 Hub> at usbus1
uhub5 on uhub0
uhub5: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/85.36, addr 1> on 
usbus1
ugen2.2: <vendor 0x8087 product 0x0025> at usbus2
ugen2.3: <vendor 0x05e3 USB2.0 Hub> at usbus2
uhub6 on uhub2
uhub6: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/85.36, addr 2> on 
usbus2
uhub4: 4 ports with 4 removable, self powered
Root mount waiting for: usbus1 usbus2
uhub5: 4 ports with 4 removable, self powered
uhub6: 4 ports with 4 removable, self powered
ugen1.3: <ITE Tech. Inc. ITE Device(8595)> at usbus1
ukbd0 on uhub0
ukbd0: <ITE Tech. Inc. ITE Device(8595), class 0/0, rev 2.00/0.03, addr 
2> on usbus1
kbd2 at ukbd0
ugen2.4: <Cooler Master Technology Inc. AMD SR4 lamplight Control> at 
usbus2
ukbd1 on uhub6
ukbd1: <Cooler Master Technology Inc. AMD SR4 lamplight Control, class 
0/0, rev 2.00/11.01, addr 3> on usbus2
kbd3 at ukbd1
WARNING: / was not properly dismounted
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Intel(R) Wireless WiFi based driver for FreeBSD
intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
smbus0: <System Management Bus> on intsmb0
iwm0: <Intel(R) Dual Band Wireless AC 9260> mem 0xfca00000-0xfca03fff 
irq 31 at device 0.0 on pci4
iwm0: hw rev 0x320, fw ver 34.3125811985.0, address bc:17:b8:b7:8b:df
acpi_wmi0: <ACPI-WMI mapping> on acpi0
acpi_wmi0: cannot find EC device
acpi_wmi0: Embedded MOF found
ACPI: \134GSA1.WQCC: 1 arguments were passed to a non-method ACPI object 
(Buffer) (20221020/nsarguments-361)
acpi_wmi1: <ACPI-WMI mapping> on acpi0
acpi_wmi1: cannot find EC device
acpi_wmi1: Embedded MOF found
ACPI: \134AOD.WQBA: 1 arguments were passed to a non-method ACPI object 
(Buffer) (20221020/nsarguments-361)
driver bug: Unable to set devclass (class: ppc devname: (unknown))
re0: link state changed to UP
lo0: link state changed to UP
re0: link state changed to DOWN
pflog0: promiscuous mode enabled
uhid0 on uhub6
uhid0: <Cooler Master Technology Inc. AMD SR4 lamplight Control, class 
0/0, rev 2.00/11.01, addr 3> on usbus2
uhid1 on uhub6
uhid1: <Cooler Master Technology Inc. AMD SR4 lamplight Control, class 
0/0, rev 2.00/11.01, addr 3> on usbus2
re0: link state changed to UP