[Bug 273372] SR-IOV Networking in Bhyve Causes Chelsio T520-SO-CR to Fail on Host, Kernel Panic if Reset

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 26 Aug 2023 22:31:33 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273372

            Bug ID: 273372
           Summary: SR-IOV Networking in Bhyve Causes Chelsio T520-SO-CR
                    to Fail on Host, Kernel Panic if Reset
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: mark@markmcb.com

If I setup SR-IOV for bhyve using a Chelsio T520-SO-CR network adapter, start
the install process for a FreeBSD guest, but shortly (maybe a minute) after I
load the driver for the pci device in the guest, the network card stops
responding in the guest and host.

# Relevant /boot/loader.conf

vmm_load="YES"
nmdm_load="YES"

t5fw_cfg_load="YES"
if_cxgbe_load="YES"
if_cxgbev_load="YES"

# Relevant /etc/rc.conf

iovctl_files="/etc/iov/cxl1.conf"

vm_enable="YES"
vm_dir="zfs:zapps/bhyve"
vm_list=""
vm_delay="5"


# Relevant /etc/iov/cxl1.conf

PF {
    device : "cxl1";                                                           
                                    num_vfs : 14;
}
DEFAULT {
    passthrough : false;
}
# ...
# VFs for bhyve
VF-10 {
    mac-addr : "aa:88:44:00:02:20";
    passthrough : true;
}
# ...


### pciconf -lvbc in host
ppt0@pci0:2:0:49:       class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425
device=0x5807 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T520-SO Unified Wire Ethernet Controller [VF]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xdd00a000, size 4096, enabled
    bar   [18] = type Memory, range 32, base 0xdd060000, size 32768, enabled
    bar   [20] = type Memory, range 32, base 0xdd194000, size 8192, enabled
    cap 10[70] = PCI-Express 2 endpoint max data 256(2048) FLR NS
                 max read 512
                 link x0(x8) speed 0.0(8.0)
    cap 11[b0] = MSI-X supports 8 messages
                 Table in map 0x20[0x0], PBA in map 0x20[0x8000]
    cap 05[50] = MSI supports 32 messages, 64 bit, vector masks
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 000e[140] = ARI 1
    ecap 0017[150] = TPH Requester 1

Use vm install freebsd-test FreeBSD-13.2-RELEASE-amd64-bootonly.iso
Config line for passthru:
passthru0="2/0/49"

Connect with vm console freebsd-test
Installer starts, choose shell.

### pciconf -lvbc in guest, prior to driver load
none0@pci0:0:5:0:       class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425
device=0x5807 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T520-SO Unified Wire Ethernet Controller [VF]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xc000e000, size 4096, enabled
    bar   [18] = type Memory, range 32, base 0xc0000000, size 32768, enabled
    bar   [20] = type Memory, range 32, base 0xc000c000, size 8192, enabled
    cap 10[70] = PCI-Express 2 endpoint max data 256(2048) FLR NS
                 max read 512
                 link x0(x8) speed 0.0(8.0)
    cap 11[b0] = MSI-X supports 8 messages
                 Table in map 0x20[0x0], PBA in map 0x20[0x8000]
    cap 05[50] = MSI supports 32 messages, 64 bit, vector masks

# kldload cxlv
t5vf0: <Chelsio T520-SO VF> mem
0xc000e000-0xc000efff,0xc0000000-0xc0007fff,0xc000c000-0xc000dfff at device 5.0
on pci0
t5vf0: 1 ports, 2 MSI-X interrupts, 3 eq, 2 iq
cxlv0: <port 0> on t5vf0
cxlv0: 1 txq, 1 rxq (NIC)

### pciconf -lvbc in guest, after driver load
t5vf0@pci0:0:5:0:       class=0x020000 rev=0x00 hdr=0x00 vendor=0x1425
device=0x5807 subvendor=0x1425 subdevice=0x0000
    vendor     = 'Chelsio Communications Inc'
    device     = 'T520-SO Unified Wire Ethernet Controller [VF]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xc000e000, size 4096, enabled
    bar   [18] = type Memory, range 32, base 0xc0000000, size 32768, enabled
    bar   [20] = type Memory, range 32, base 0xc000c000, size 8192, enabled
    cap 10[70] = PCI-Express 2 endpoint max data 256(2048) FLR NS
                 max read 512
                 link x0(x8) speed 0.0(8.0)
    cap 11[b0] = MSI-X supports 8 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x8000]
    cap 05[50] = MSI supports 32 messages, 64 bit, vector masks

Shortly after loading the driver, I lose networking on the host.
dmesg shows nothing after event

# ifconfig looks normal
cxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
       
options=6ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP,NOMAP>
        ether 00:07:43:36:bc:80
        inet 10.0.1.201 netmask 0xffffff00 broadcast 10.0.1.255
        media: Ethernet 10Gbase-Twinax <full-duplex,rxpause,txpause>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

# ifconfig cxl0 down
# ifconfig cxl0 up
# Aug 26 15:21:11 core18 kernel: t5nex0: command 0x16 in mbox 4 timed out
(0x4014c010).
Aug 26 15:21:11 core18 kernel: t5nex0: mbox 4 cmdsent 16a0094400000001
05dc050000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000
Aug 26 15:21:11 core18 kernel: t5nex0: mbox 4 current 16a0094400000001
05dc050000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000
Aug 26 15:21:11 core18 kernel: t5nex0: encountered fatal error, adapter stopped
(1).

-- 
You are receiving this mail because:
You are the assignee for the bug.