[Bug 283285] Kernel panic at boot on Intel Atom C3758 w/ QAT module

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 12 Dec 2024 17:29:00 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283285

            Bug ID: 283285
           Summary: Kernel panic at boot on Intel Atom C3758 w/ QAT module
           Product: Base System
           Version: 14.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: me@benschumacher.com

After attempting to upgrade my system to 14.2-RELEASE, I've encountered a crash
that appears to be related to the qat.ko driver. Strangely, it seems I am able
to load the module after boot, but when I have it enabled in my loader.conf,
the kernel crashes.

I have not been able to successfully produce a dump, despite attempting to
manualling assign a dumpdev in the loader. Also, I cannot interact with this
from the console, though I don't entirely understand why, since I do have a USB
keyboard attached.

This text is copied from a picture I took of my console:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 08
fault virtual address = 0x4
fault code instruction pointer
= supervisor read data, page. not present = 0x20:0xffffffff8087e352
stack pointer
= 0x28:0xfffffe00e1f679b0
frame pointer
= 0x28:0xfffffe00e1f67a70
code segment
= base Bx0, limit Bxfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags
= interrupt enabled, resume, TOPL = B
current process
= 0 (firmuare taskq)
rdi: fffffe00e1f67cf0 rsi: fffff80001bf8c01 rdx: fffff80001bf8c00
rcx: fffffe00e1f67d70 8: 00000000000003е3 9: 0000000000000000 rax:
0000000000000000 rbx: fffffe00e1f67cf0 rbp: fffffe00e1f67a70 r10:
fffff80001c7de90 r11: 0000000000000003 r12: fffffe00e1f67a94 r13:
0000000000000000 r14: fffffe00e1f67d60 r15: fffff80001956740
trap number
= 12
panic: page fault
cpuid = 2
time = 3
KDB: stack backtrace:
80 Bxffffffff8080313d at kdb_backtrace+0x5d
#1 Bxffffffff807b6be9 at vpanic+0x169
0xffffffff807b6a73 at panic+0x43
#3 Bxffffffff80bcf0ßd at trap_fatal+#x3fd
84 0xffffffff88bcf056 at trap_pfault+0x46
15 Bxffffffff80ba9788 at calltrap+Bx8
Bxffffffff80889a14 at namei+0x104
7 Bxffffffff80Baefae at vn_open_cred+0x55e
#B Bxffffffff807fef95 at loadimage+0x235
89 0xffffffff808175c1 at taskqueue_run_locked+0x191
810 0xffffffff80818852 at taskqueue_thread_loop+Đxc2
#11 Bxffffffff80771f2f at fork_exit+@x7f
#12 Bxffffffff80baa7ee at fork_trampoline+Axe
Uptime: 3s
Automatic reboot in 15 seconds - press a key on the console to abort

I diagnosed this by commenting out all of the _load statements in my
/boot/loader.conf, and then enabling them one-by-one. Leaving qat_load and
qat_c3xxx_fw_load commented out allowed me to boot.

# use Intel QAT
#qat_c3xxx_fw_load="YES"    # BFS 2024-12-12 
#qat_load="YES"             # BFS 2024-12-12 

But I am able to load these modules from the command-line after boot:

$ kldload qat_c3xxx_fw
$ kldload qat
$ kldstat -v

... cut for space ...

30    1 0xffffffff83545000   122c20 qat_c3xxx_fw.ko
(/boot/kernel/qat_c3xxx_fw.ko)
        Contains modules:
                 Id Name
                404 qat_c3xxx_fw_fw
31    1 0xffffffff830e3000     4390 qat.ko (/boot/kernel/qat.ko)
        Contains modules:
                 Id Name
                414 nexus/qat
32    6 0xffffffff830e8000    15dd0 qat_hw.ko (/boot/kernel/qat_hw.ko)
        Contains modules:
                 Id Name
                413 pci/qat_c4xxx
                408 pci/qat_200xx
                412 pci/qat_dh895xcc
                409 pci/qat_4xxx
                411 pci/qat_c3xxx
                407 pci/qat_c62x
                410 pci/qat_4xxxvf
33    9 0xffffffff830fe000    30010 qat_common.ko (/boot/kernel/qat_common.ko)
        Contains modules:
                 Id Name
                405 qat_common
34    8 0xffffffff8312f000    68cd8 qat_api.ko (/boot/kernel/qat_api.ko)
        Contains modules:
                 Id Name
                406 qat_api

I do have a custom kernel, though this is mostly to remove a bunch of devices
that I do not use. This system acts as a NAS/VM host within my homelab. It is a
Supermicro A2SDi-8C+-HLN4F with 32 GB of ECC RAM.

The QAT functionality isn't strictly required for me, so I've left the module
disabled at boot, not that this machine is frequently restarted.

I'm happy to try to help further diagnose this if I can.

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.