[Bug 273151] Kernel panic caused by audio driver on cold boot - AMD Ryzen 9 7900

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 15 Aug 2023 19:10:04 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273151

            Bug ID: 273151
           Summary: Kernel panic caused by audio driver on cold boot - AMD
                    Ryzen 9 7900
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: matei35@yahoo.com

Created attachment 244128
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=244128&action=edit
Shows the DELAY lines inserted in trap.c

Attached are the following files:
dmesg.boot_with_audio_driver
dmesg.boot_without_audio_driver
kernel_panic.MOV
kernel_panic_screen_messages.jpg
kernel_panic_with_DELAY.MOV
kernel_without_audio_driver.MOV
trap.c_diff

Desktop components:
cpu: AMD Ryzen 9 7900
motherboard: Gigabyte B650 Aorus Elite AX
memory: Corsair 32 GB (2x16GB)
storage: SSD Gigabyte 1 TB

ZFS is being used with 2GB swap.

Fot this panic, the kernel crash dump is not created in /var/crash.
I can simulate a kernel crash with sysctl debug.kdb.panic and
the dump is created in /var/crash.

Replacing the Corsair memory with G.Skill and installing FreeBSD
on a HDD instead of SDD does not make a difference.

When cold booting, the kernel panics as shown in kernel_panic.MOV.
After the panic, the system reboots and it's ok.

The call stack in /usr/src/sys/amd64/amd64/trap.c:
trap_fatal()   called from line 795
trap_pfault()  called from line 385
trap()         called from line 665
trap_check()   called from /usr/src/sys/amd64/amd64/exception.S, line 290

I changed the file /usr/src/sys/amd64/amd64/trap.c as shown in
trap.c_diff in order to get a better view of the screen messages as
you can see in kernel_panic_with_DELAY.MOV.

The screen messages can also be seen in the file
kernel_panic_screen_messages.jp
g and are also shown below:
...
acpi_tz0: <Thermal Zone> on acpi0
cpu0: <ACPI CPU> on acpi0
hwpstate0: <Cool'n'Quiet 2.0> on cpu0
Timecounter "TSC-low" frequency 1846531959 Hz quality 1000

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80972392
stack pointer           = 0x20:0xfffffe01072d5de0
frame pointer           = 0x20:0xfffffe01072d5e00
code segment            = base 0x0, limit 0xfffff, type 0x1b
                          DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq78: hdac1) <------------- AAAAA
trap number             = 12


TRAPFRAME:
   tf_rdi = -8796050178040
   tf_rsi = 0
   tf_rdx = 1
   tf_rcx = -8796054324096
   tf_r8  = -2194564528880
   tf_r9  = -2194607874048
   tf_rax = 1
   tf_rbx = -8796050178048
   tf_rbp = -2194607874560
   tf_r10 = 2000
   tf_r11 = 2146883647
   tf_r12 = 233
   tf_r13 = -8796050178048
   tf_r14 = 0
   tf_r15 = 0
   tf_trapno = 12
   tf_fs = 19
   tf_gs = 40
   tf_addr = 8
   tf_flags = 1
   tf_es = 59
   tf_ds = 59
   tf_err = 0
   tf_rip = -2137578606
   tf_cs = 32
   tf_rflags = 66066
   tf_rsp = -2194607874592
   tf_ss = 40

Workaround:
based on the line marked with "AAAAA" above and dmesg.boot_with_audio_driver, I
disalbed the audio driver by adding the following lines to /boot/device.hints:
hint.pcm.4.disabled="1"
hint.pcm.5.disabled="1"
hint.hdaa.1.disabled="1"
hint.hdacc.1.disabled="1"
hint.hdac.1.disabled="1"
and the kernel no longer panics as shown in kernel_without_audio_driver.MOV.

-- 
You are receiving this mail because:
You are the assignee for the bug.