[Bug 282373] kernel panic on boot with Chelsio T320 installed

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 28 Oct 2024 01:27:54 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=282373

            Bug ID: 282373
           Summary: kernel panic on boot with Chelsio T320 installed
           Product: Base System
           Version: 14.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: biscuits.carry.0j@icloud.com

Both " 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64" and
"14.2-PRERELEASE stable/14-n269296-5ae76ff5138e GENERIC amd64" kernels panic at
the same point during startup on a system with a Chelsio T320 NIC installed,
with two 10Gbase-SR SFP+ installed.

This crash appears to be triggered by something in the OPNSense 24.7 startup,
but userspace doesn't appear to be doing anything unreasonable at the time.

Starting device manager...
acpi_wmi0: <ACPI-WMI mapping> on acpi0
acpi_wmi0: Embedded MOF found
ACPI: \_SB.WMIB.WQZZ: 1 arguments were passed to a non-method ACPI object
(Buffer) (20221020/nsarguments-361)
acpi_wmi1: <ACPI-WMI mapping> on acpi0
acpi_wmi1: Embedded MOF found
ACPI: \_SB.WMIV.WQZZ: 1 arguments were passed to a non-method ACPI object
(Buffer) (20221020/nsarguments-361)
acpi_wmi2: <ACPI-WMI mapping> on acpi0
acpi_wmi2: Embedded MOF found
cxgbc0: <Chelsio T320, 2 ports> mem 0xd1000000-0xd1000fff,0xd1001000-0xd1001fff
irq 16 at device 0.0 on pci1
cxgbc0: using MSI-X interrupts (9 vectors)
cxgb0: <Port 0 10GBASE-R> on cxgbc0
Fatal trap 12: page fault while in kernel mode

I am not able to reproduce the panic when booting from a FreeBSD 14.1 or
FreeBSD 14.2-PRERELEASE memstick, although I have not tried an install.

Here's the backtrace from the latest 14.2 PRERELEASE kernel:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe00aab0b6f8
frame pointer           = 0x28:0xfffffe00aab0b720
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 367 (devctl)
rdi: fffff80024451000 rsi: fffffe00aab0b770 rdx: fffffe00acbb9ed8
rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000000
rax: 0000000000000000 rbx: fffffe00aab0b770 rbp: fffffe00aab0b720
r10: fffff8003bb06800 r11: 0000000000000800 r12: 0000000000008802
r13: fffff8003bb06810 r14: fffffe00acbb9ed8 r15: 0000000000000000
trap number             = 12
panic: page fault
cpuid = 2
time = 1730034648
KDB: stack backtrace:
#0 0xffffffff80b8b9bd at kdb_backtrace+0x5d
#1 0xffffffff80b3e101 at vpanic+0x131
#2 0xffffffff80b3dfc3 at panic+0x43
#3 0xffffffff81024a0b at trap_fatal+0x40b
#4 0xffffffff81024a56 at trap_pfault+0x46
#5 0xffffffff80ffb538 at calltrap+0x8
#6 0xffffffff80d897e5 at dump_iface+0x145
#7 0xffffffff80d891a9 at rtnl_handle_ifevent+0xa9
#8 0xffffffff80c5c75f at if_attach_internal+0x3df
#9 0xffffffff80c6784c at ether_ifattach+0x2c
#10 0xffffffff83327b53 at cxgb_port_attach+0x1d3
#11 0xffffffff80b7abac at device_attach+0x3ac
#12 0xffffffff80b7be7b at bus_generic_attach+0x4b
#13 0xffffffff83326ab6 at cxgb_controller_attach+0x926
#14 0xffffffff80b7abac at device_attach+0x3ac
#15 0xffffffff80b7a7e1 at device_probe_and_attach+0x41
#16 0xffffffff80818382 at pci_driver_added+0xf2
#17 0xffffffff80b78269 at devclass_driver_added+0x29

I can provide vmcore files from 14.1-RELEASE or 14.2-PRERELEASE, and am happy
to test with other kernels if necessary.

-- 
You are receiving this mail because:
You are the assignee for the bug.