vnet/if_bridge: ARP issues: connection failure

From: FreeBSD User <freebsd_at_walstatt-de.de>
Date: Sat, 14 May 2022 09:39:23 UTC
Hello,

the problem I'm about to report is not specific to CURRENT, I report this to CURRENT
because I'm list member. The problem occurs on FreeBSD 12.3-RELEASE-p5.

Setup: connecting to vnet jails bound to a dedicated NIC via if_bridge results in an
erratic behaviour (from my point of view). The box has two NICs, em0 which is dedicated
for managing the host and igb0 for services like NFS, SMB and jails (the host is de facto
a Xigmanas box). The NIC igb0 also has an IPv4 which is accessible without problem (sshd
is listening on em0 and igb0 and any service bound to igb0 and its IP itself works like a
charme, execept anything that is connected via if_bridge/vnet). Both, em0 and igb0, are
member of the same network and connected to the same switch (I guess, it's the campus
infrastructure, I have no access to that).

Phenomenon: trying to ping a jail results initially in a long term waiting and often in
"Host is down" - but then, sudenly, the jail is responding. Trying to connect to
port 22/tcp of any jail doesn't work in 90% of the cases, but randomly, a host (out of
five) does respond and the connection can be established. Terminating the connection and
tryin again is in 99% then a fail. Once connected the ssh connection fries after a couple
of seconds of inactivity.

Checking ARP on the jail (login via host and jexec) and listening via
tcpdump -XXvi vnet0 arp
on a jail while pinging the jail from the netowrk shows up the typical requests, but not
every request is then acompanied by a reply. I'm not firm in terms of networking and
investigating ARP issues, so I followed some instructions found with ARP issues on FBSD,
vnet and routing.

MIB settings (on the host itself, vnet untouched):
net.inet.ip.forwarding: 0
net.link.bridge.ipfw: 0
net.link.bridge.allow_llz_overlap: 0
net.link.bridge.inherit_mac: 0
net.link.bridge.log_stp: 0
net.link.bridge.pfil_local_phys: 0
net.link.bridge.pfil_member: 0
net.link.bridge.ipfw_arp: 0
net.link.bridge.pfil_bridge: 0
net.link.bridge.pfil_onlyip: 0

I also realised that on the igb0 interace checksum errors occured while rxcsum is
enabled. I disbaled special features via "ifconfig igb0  -rxcsum -txcsum -tso -lro

I'm out of ideas here.

Another box, the same base OS, similar setup (two NICs, same ambition), but with the
difference that the second NIC resides on a different network and is connected to a
different switch, also if_bridge and vnet attached jails, there is no problem.

Either there is a serious bug in 12.3-p5 (haven't had the chance to check on 13/CURRENT)
or I'm doing something terribly wrong.

Some technical details:




em0@pci0:0:25:0:        class=0x020000 card=0x29828016 chip=0x153b8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I217-V'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xf7d00000, size 131072, enabled
    bar   [14] = type Memory, range 32, base 0xf7d35000, size 4096, enabled
    bar   [18] = type I/O Port, range 32, base 0xf080, size 32, enabled
    cap 01[c8] = powerspec 2  supports D0 D3  current D0
    cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
    cap 13[e0] = PCI Advanced Features: FLR TP


gb0@pci0:4:0:0:        class=0x020000 card=0x00028086 chip=0x15848086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xf7900000, size 1048576, enabled
    bar   [1c] = type Memory, range 32, base 0xf7a00000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 max read 512
                 link x1(x1) speed 2.5(2.5) ASPM L1(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 xxxxxxxxxxxxxxxxxxxxx
    ecap 0017[1a0] = TPH Requester 1

Kind regards,

oh