[Bug 282431] Chelsio T540-CR: t5nex0: command 0x16 in mbox 4 timed out (0x4014c010).

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 30 Oct 2024 21:06:02 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=282431

            Bug ID: 282431
           Summary: Chelsio T540-CR: t5nex0: command 0x16 in mbox 4 timed
                    out (0x4014c010).
           Product: Base System
           Version: 15.0-CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: lexi.freebsd@le-fay.org

Created attachment 254694
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=254694&action=edit
kernel messages

host: FreeBSD 15.0-CURRENT #3 lf/main-n269068-2cff93ced1d: Wed Oct 23 02:56:54
BST 2024    
srcmastr@hemlock.eden.le-fay.org:/data/build/obj/freebsd/data/build/src/freebsd/lf/main/amd64.amd64/sys/LF

which is src from ~e2414d91d33f31d6f2c9f49eef7a1553b5798c9e.

i have a Chelsio T540-CR installed in a PCIe slot on an ASRock B450m Pro4 R2.0
motherboard with an AMD Ryzen 7 2700X CPU.

i have several bhyve VMs using SR-IOV passthrough: 4 FreeBSD, 2 Linux, 3
MikroTik RouterOS.  this was working fine until just now when the host and VM
networking stopped working.  i tried to do 'ifconfig cxl3 down; ifconfig cxl3'
up and the kernel logged this:

Oct 30 20:49:25 hemlock kernel: t5nex0: command 0x16 in mbox 4 timed out
(0x4014c010).
Oct 30 20:49:25 hemlock kernel: t5nex0: mbox 4 cmdsent 16a0094700000001
2328050000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000
Oct 30 20:49:25 hemlock kernel: t5nex0: mbox 4 current 16a0094700000001
2328050000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000
Oct 30 20:49:25 hemlock kernel: t5nex0: stop_adapter from 0xfffff80122ad9740,
flags 0x0000024d,0x00000001
Oct 30 20:49:25 hemlock kernel: t5nex0: encountered fatal error, adapter
stopped (1).

at this point, the only way to recover the network was to reboot.

i have attached the kernel log from the time covering the incident, which
includes a lot of additional diagnostic information.

-- 
You are receiving this mail because:
You are the assignee for the bug.