Re: debugging OS hangs

From: Edward Sanford Sutton, III <mirror176_at_hotmail.com>
Date: Sun, 03 Nov 2024 01:46:11 UTC
On 11/2/24 11:02, Tom Everett wrote:
> I have a FreeBSD host running 14.1 which has an uptime of maybe a day or 
> two between hangs.  When it hangs there is no console output indicating 
> a problem and it no longer responds to ping.  Keyboard input is ignored.

Some bad states I can get my machine into become slow to respond to 
where it has gone as far as several minutes after I tap the power button 
on the case to request a shutdown before it actually happens; it times 
out processes eventually to get it to happen.

> I have attached the output of dmesg below.
> 
> I ran a memory test; the RAM chips seem to be fine.  I’ve replace the 
> root disk too, and put in a new network card in the hope that the 
> problem was a nic driver.  The hangs still happen.

Testing is good but sometimes won't show all issues that hardware has. 
There are other possibilities like power supply not handling when 
certain load or load changes happen or temperature related issues that 
change under certain combinations of hardare being loaded or once up to 
temperature. This being said, software problems are certainly possible.

> Are there any switches I can set to get some clues as to what is happening?

If it is hanging it may not be in a state that tries or can write a dump 
file but you could try to activate that if it isn't already. Do you have 
a swap partition? Leading up to the lockup was there much need for RAM 
or was a lot of it free/cache. Anything else notable showing up in logs 
before the lockup? You could start logging more and more things while 
trying to track down specific activity like writing top output to a log 
at specific intervals. Dtrace is a tool that can dig into a lot more 
activity details while things are going on but may impact performance 
depending on what its doing. Are you using wireless or just wired 
networking?

Any machine history like if it was working before an OS or program 
upgrade, BIOS/UEFI update, hardware change, etc. or is it a new install?

Before knowing more, I'd also look for BIOS updates and SSD updates. I 
think there were some recent code changes that impacted a possible 
lockup with TRIM but I thought that was SATA SSDs.

> ---<<BOOT>>---
> Copyright (c) 1992-2023 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>      The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 14.1-RELEASE-p5 GENERIC amd64
> FreeBSD clang version 18.1.5 (https://github.com/llvm/llvm-project.git 
> llvmorg-18.1.5-0-g617a15a9eac9)
> VT(efifb): resolution 1024x768
> CPU: AMD Ryzen 9 3900X 12-Core Processor             (3793.05-MHz K8- 
> class CPU)
>    Origin="AuthenticAMD"  Id=0x870f10  Family=0x17  Model=0x71 Stepping=0
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
> Features2=0x7ef8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>    AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
>    AMD 
> Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,ADMSKX>
>    Structured Extended 
> Features=0x219c91a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA>
>    Structured Extended Features2=0x400004<UMIP,RDPID>
>    XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
>    AMD Extended Feature Extensions ID 
> EBX=0x108b657<CLZERO,IRPerf,XSaveErPtr,RDPRU,WBNOINVD,IBPB,STIBP,SSBD>
>    SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
>    TSC: P-state invariant, performance statistics
> real memory  = 68717379584 (65534 MB)
> avail memory = 66761863168 (63669 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: <ALASKA A M I >
> FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
> FreeBSD/SMP: 1 package(s) x 4 cache groups x 3 core(s) x 2 hardware threads
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> random: unblocking device.
> ioapic0 <Version 2.1> irqs 0-23
> ioapic1 <Version 2.1> irqs 24-55
> Launching APs: 16 17 15 13 8 6 1 7 10 2 18 5 11 20 22 4 3 21 23 19 14 9 12
> random: entropy device external interface
> kbd1 at kbdmux0
> efirtc0: <EFI Realtime Clock>
> efirtc0: registered as a time-of-day clock, resolution 1.000000s
> smbios0: <System Management BIOS> at iomem 0xbd9f1000-0xbd9f101e
> smbios0: Version: 3.3, BCD Revision: 3.3
> aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256>
> acpi0: <ALASKA A M I >
> acpi0: Power Button (fixed)
> cpu0: <ACPI CPU> on acpi0
> attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Event timer "i8254" frequency 1193182 Hz quality 100
> atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
> atrtc0: registered as a time-of-day clock, resolution 1.000000s
> Event timer "RTC" frequency 32768 Hz quality 0
> hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 
> on acpi0
> Timecounter "HPET" frequency 14318180 Hz quality 950
> Event timer "HPET" frequency 14318180 Hz quality 350
> Event timer "HPET1" frequency 14318180 Hz quality 350
> Event timer "HPET2" frequency 14318180 Hz quality 350
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
> acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pci0: <base peripheral, IOMMU> at device 0.2 (no driver attached)
> pcib1: <ACPI PCI-PCI bridge> at device 1.1 on pci0
> pci1: <ACPI PCI bus> on pcib1
> nvme0: <Generic NVMe Device> mem 0xfcf00000-0xfcf03fff irq 24 at device 
> 0.0 on pci1
> pcib2: <ACPI PCI-PCI bridge> at device 1.2 on pci0
> pci2: <ACPI PCI bus> on pcib2
> pcib3: <ACPI PCI-PCI bridge> irq 28 at device 0.0 on pci2
> pci3: <ACPI PCI bus> on pcib3
> pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci3
> pci4: <ACPI PCI bus> on pcib4
> pci4: <network> at device 0.0 (no driver attached)
> pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci3
> pci5: <ACPI PCI bus> on pcib5
> igb0: <Intel(R) I211 (Copper)> port 0xe000-0xe01f mem 
> 0xfc900000-0xfc91ffff,0xfc920000-0xfc923fff irq 28 at device 0.0 on pci5
> igb0: NVM V0.6 imgtype1
> igb0: Using 1024 TX descriptors and 1024 RX descriptors
> igb0: Using 2 RX queues 2 TX queues
> igb0: Using MSI-X interrupts with 3 vectors
> igb0: Ethernet address: 18:c0:4d:89:1c:0f
> igb0: netmap queues/slots: TX 2/1024, RX 2/1024
> pcib6: <ACPI PCI-PCI bridge> at device 5.0 on pci3
> pci6: <ACPI PCI bus> on pcib6
> re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 
> 0xd000-0xd0ff mem 0xfc804000-0xfc804fff,0xfc800000-0xfc803fff irq 29 at 
> device 0.0 on pci6
> re0: Using 1 MSI-X message
> re0: Chip rev. 0x54000000
> re0: MAC rev. 0x00100000
> miibus0: <MII bus> on re0
> rgephy0: <RTL8251/8153 1000BASE-T media interface> PHY 1 on miibus0
> rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 
> 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 
> 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
> re0: Using defaults for TSO: 65518/35/2048
> re0: Ethernet address: 00:e0:4c:69:2f:72
> re0: netmap queues/slots: TX 1/256, RX 1/256
> pcib7: <ACPI PCI-PCI bridge> at device 6.0 on pci3
> pci7: <ACPI PCI bus> on pcib7
> xhci0: <XHCI (generic) USB 3.0 controller> mem 0xfc700000-0xfc700fff irq 
> 30 at device 0.0 on pci7
> xhci0: 32 bytes context size, 64-bit DMA
> usbus0 on xhci0
> usbus0: 5.0Gbps Super Speed USB v3.0
> pcib8: <ACPI PCI-PCI bridge> irq 28 at device 8.0 on pci3
> pci8: <ACPI PCI bus> on pcib8
> xhci1: <AMD Matisse USB 3.0 controller> mem 0xfc400000-0xfc4fffff irq 28 
> at device 0.1 on pci8
> xhci1: 64 bytes context size, 64-bit DMA
> usbus1 on xhci1
> usbus1: 5.0Gbps Super Speed USB v3.0
> xhci2: <AMD Matisse USB 3.0 controller> mem 0xfc300000-0xfc3fffff irq 30 
> at device 0.3 on pci8
> xhci2: 64 bytes context size, 64-bit DMA
> usbus2 on xhci2
> usbus2: 5.0Gbps Super Speed USB v3.0
> pcib9: <PCI-PCI bridge> irq 29 at device 9.0 on pci3
> pci9: <PCI bus> on pcib9
> ahci0: <AMD KERNCZ AHCI SATA controller> mem 0xfc600000-0xfc6007ff irq 
> 29 at device 0.0 on pci9
> ahci0: AHCI v1.31 with 2 6Gbps ports, Port Multiplier supported with FBS
> ahcich2: <AHCI channel> at channel 2 on ahci0
> ahcich3: <AHCI channel> at channel 3 on ahci0
> pcib10: <PCI-PCI bridge> irq 30 at device 10.0 on pci3
> pci10: <PCI bus> on pcib10
> ahci1: <AMD KERNCZ AHCI SATA controller> mem 0xfc500000-0xfc5007ff irq 
> 30 at device 0.0 on pci10
> ahci1: AHCI v1.31 with 4 6Gbps ports, Port Multiplier supported with FBS
> ahcich4: <AHCI channel> at channel 0 on ahci1
> ahcich5: <AHCI channel> at channel 1 on ahci1
> ahcich8: <AHCI channel> at channel 4 on ahci1
> ahcich9: <AHCI channel> at channel 5 on ahci1
> pcib11: <ACPI PCI-PCI bridge> at device 3.1 on pci0
> pci11: <ACPI PCI bus> on pcib11
> vgapci0: <VGA-compatible display> port 0xf000-0xf0ff mem 
> 0xd0000000-0xdfffffff,0xe0000000-0xe01fffff,0xfce00000-0xfce3ffff irq 54 
> at device 0.0 on pci11
> vgapci0: Boot video device
> hdac0: <ATI (0xaaf0) HDA Controller> mem 0xfce60000-0xfce63fff irq 55 at 
> device 0.1 on pci11
> pcib12: <ACPI PCI-PCI bridge> at device 7.1 on pci0
> pci12: <ACPI PCI bus> on pcib12
> pcib13: <ACPI PCI-PCI bridge> at device 8.1 on pci0
> pci13: <ACPI PCI bus> on pcib13
> pci13: <encrypt/decrypt> at device 0.1 (no driver attached)
> xhci3: <AMD Matisse USB 3.0 controller> mem 0xfcb00000-0xfcbfffff irq 39 
> at device 0.3 on pci13
> xhci3: 64 bytes context size, 64-bit DMA
> usbus3 on xhci3
> usbus3: 5.0Gbps Super Speed USB v3.0
> hdac1: <AMD X570 HDA Controller> mem 0xfcd00000-0xfcd07fff irq 36 at 
> device 0.4 on pci13
> isab0: <PCI-ISA bridge> at device 20.3 on pci0
> isa0: <ISA bus> on isab0
> acpi_button0: <Power Button> on acpi0
> acpi_tz0: <Thermal Zone> on acpi0
> acpi_tz1: <Thermal Zone> on acpi0
> acpi_tz2: <Thermal Zone> on acpi0
> orm0: <ISA Option ROM> at iomem 0xc0000-0xce7ff pnpid ORM0000 on isa0
> atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> atkbdc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 15.
> hwpstate0: <Cool`n'Quiet 2.0> on cpu0
> Timecounter "TSC-low" frequency 1896437301 Hz quality 1000
> Timecounters tick every 1.000 msec
> hdacc0: <ATI R6xx HDA CODEC> at cad 0 on hdac0
> hdaa0: <ATI R6xx Audio Function Group> at nid 1 on hdacc0
> pcm0: <ATI R6xx (HDMI)> at nid 3 on hdaa0
> pcm1: <ATI R6xx (HDMI)> at nid 5 on hdaa0
> pcm2: <ATI R6xx (HDMI)> at nid 7 on hdaa0
> pcm3: <ATI R6xx (HDMI)> at nid 9 on hdaa0
> pcm4: <ATI R6xx (HDMI)> at nid 11 on hdaa0
> pcm5: <ATI R6xx (HDMI)> at nid 13 on hdaa0
> hdacc1: <Realtek ALCS1200A HDA CODEC> at cad 0 on hdac1
> hdaa1: <Realtek ALCS1200A Audio Function Group> at nid 1 on hdacc1
> pcm6: <Realtek ALCS1200A (Rear Analog 5.1/2.0)> at nid 20,22,21 and 
> 24,26 on hdaa1
> pcm7: <Realtek ALCS1200A (Front Analog)> at nid 27 and 25 on hdaa1
> pcm8: <Realtek ALCS1200A (Rear Digital)> at nid 30 on hdaa1
> ugen3.1: <AMD XHCI root HUB> at usbus3
> Trying to mount root from ufs:/dev/nda0p2 [rw]...
> ugen1.1: <AMD XHCI root HUB> at usbus1
> uhub0 on usbus1
> uhub0: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
> ugen2.1: <AMD XHCI root HUB> at usbus2
> uhub1 on usbus3
> uhub1: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus3
> uhub2 on usbus2
> uhub2: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus2
> ugen0.1: <(0x1106) XHCI root HUB> at usbus0
> uhub3 on usbus0
> uhub3: <(0x1106) XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
> uhub3: 5 ports with 4 removable, self powered
> nda0 at nvme0 bus 0 scbus6 target 0 lun 1
> nda0: <WD_BLACK SN850X 1000GB 620331WD 233204800587>
> nda0: Serial Number 233204800587
> nda0: nvme version 1.4
> nda0: 953869MB (1953525168 512 byte sectors)
> ada0 at ahcich4 bus 0 scbus2 target 0 lun 0
> ada0: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
> ada0: Serial Number WD-WX12D50LH5YY
> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 5723166MB (11721045168 512 byte sectors)
> ada0: quirks=0x1<4K>
> ada1 at ahcich5 bus 0 scbus3 target 0 lun 0
> ada1: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
> ada1: Serial Number WD-WX32D50E53LJ
> ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 5723166MB (11721045168 512 byte sectors)
> ada1: quirks=0x1<4K>
> ada2 at ahcich8 bus 0 scbus4 target 0 lun 0
> ada2: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
> ada2: Serial Number WD-WX72D507J792
> ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada2: Command Queueing enabled
> ada2: 5723166MB (11721045168 512 byte sectors)
> ada2: quirks=0x1<4K>
> ada3 at ahcich9 bus 0 scbus5 target 0 lun 0
> ada3: <WDC WD60EFRX-68L0BN1 82.00A82> ACS-2 ATA SATA 3.x device
> ada3: Serial Number WD-WX72D507J8YK
> ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada3: Command Queueing enabled
> ada3: 5723166MB (11721045168 512 byte sectors)
> ada3: quirks=0x1<4K>
> uhub1: 8 ports with 8 removable, self powered
> uhub0: 10 ports with 10 removable, self powered
> uhub2: 10 ports with 10 removable, self powered
> ugen0.2: <vendor 0x2109 USB2.0 Hub> at usbus0
> uhub4 on uhub3
> uhub4: <vendor 0x2109 USB2.0 Hub, class 9/0, rev 2.10/4.20, addr 1> on 
> usbus0
> ugen1.2: <vendor 0x05e3 USB2.0 Hub> at usbus1
> uhub5 on uhub0
> uhub5: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/85.36, addr 1> on 
> usbus1
> ugen2.2: <vendor 0x8087 product 0x0025> at usbus2
> ugen2.3: <vendor 0x05e3 USB2.0 Hub> at usbus2
> uhub6 on uhub2
> uhub6: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/85.36, addr 2> on 
> usbus2
> uhub4: 4 ports with 4 removable, self powered
> Root mount waiting for: usbus1 usbus2
> uhub5: 4 ports with 4 removable, self powered
> uhub6: 4 ports with 4 removable, self powered
> ugen1.3: <ITE Tech. Inc. ITE Device(8595)> at usbus1
> ukbd0 on uhub0
> ukbd0: <ITE Tech. Inc. ITE Device(8595), class 0/0, rev 2.00/0.03, addr 
> 2> on usbus1
> kbd2 at ukbd0
> ugen2.4: <Cooler Master Technology Inc. AMD SR4 lamplight Control> at 
> usbus2
> ukbd1 on uhub6
> ukbd1: <Cooler Master Technology Inc. AMD SR4 lamplight Control, class 
> 0/0, rev 2.00/11.01, addr 3> on usbus2
> kbd3 at ukbd1
> WARNING: / was not properly dismounted
> ZFS filesystem version: 5
> ZFS storage pool version: features support (5000)
> Intel(R) Wireless WiFi based driver for FreeBSD
> intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
> smbus0: <System Management Bus> on intsmb0
> iwm0: <Intel(R) Dual Band Wireless AC 9260> mem 0xfca00000-0xfca03fff 
> irq 31 at device 0.0 on pci4
> iwm0: hw rev 0x320, fw ver 34.3125811985.0, address bc:17:b8:b7:8b:df
> acpi_wmi0: <ACPI-WMI mapping> on acpi0
> acpi_wmi0: cannot find EC device
> acpi_wmi0: Embedded MOF found
> ACPI: \134GSA1.WQCC: 1 arguments were passed to a non-method ACPI object 
> (Buffer) (20221020/nsarguments-361)
> acpi_wmi1: <ACPI-WMI mapping> on acpi0
> acpi_wmi1: cannot find EC device
> acpi_wmi1: Embedded MOF found
> ACPI: \134AOD.WQBA: 1 arguments were passed to a non-method ACPI object 
> (Buffer) (20221020/nsarguments-361)
> driver bug: Unable to set devclass (class: ppc devname: (unknown))
> re0: link state changed to UP
> lo0: link state changed to UP
> re0: link state changed to DOWN
> pflog0: promiscuous mode enabled
> uhid0 on uhub6
> uhid0: <Cooler Master Technology Inc. AMD SR4 lamplight Control, class 
> 0/0, rev 2.00/11.01, addr 3> on usbus2
> uhid1 on uhub6
> uhid1: <Cooler Master Technology Inc. AMD SR4 lamplight Control, class 
> 0/0, rev 2.00/11.01, addr 3> on usbus2
> re0: link state changed to UP