[Bug 221122] Attaching interface to a bridge stops all traffic on uplink NIC for few seconds

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 31 Aug 2023 20:18:15 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221122

--- Comment #33 from spork@bway.net ---
Some additional testing here...

There are two workarounds presented in this thread:

- Add "-txcsum -tso4 -tso6 -txcsum6" (or whatever your NIC requires) to the
ifconfig statement for your interface(s) in rc.conf. This requires knowing what
you need to disable to make sure your NIC and epair have equal capabilities so
that when the epair interface is added to the bridge, there's no need to reinit
the NIC to make the capabilities match, and therefore, no connectivity loss.

- Pre-plumb the bridge and epair interfaces by adding them to rc.conf's
cloned_interfaces and add the epair to the "addm" ifconfig line. On boot, the
"addm" runs and we don't care about the reinit of the NIC because it's during
boot. This method does not require knowing what capabilities need to be
disabled on the NIC.

I'm finding neither of these actually work as workarounds, because in 13.2 with
my ixl NICs I can see both with iocage (a jail shutdown or restart) and with
manual ifconfig commands (removing a vtnet interface from a bridge) cause the
NIC to reinit. In other words, removing an epair/vtnet interface from a bridge
seems to put the offloading capabilities back in place, rendering either
workaround useless.

Again, I'm not clear on what the fix was that was mentioned in comment #28, so
if I'm way off base here, let me know!

Example follows...

We have a bridge containing my external ixl interface and an epair/vtnet
interface from a jail:

[root@clweb5 /home/spork]# ifconfig bridge0
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 58:9c:fc:10:ff:d9
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: vnet0.10 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 2000
        member: ext0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 1 priority 128 path cost 55
        groups: bridge
        nd6 options=9<PERFORMNUD,IFDISABLED>

The ext0 (ixl) interface was already a member of the bridge when the jail
started to there was NO NIC reinit/loss of connectivity when the jail started
(good!).

ext0 options look like this while a member of bridge0 (ie: txcsum and two for
v4 and v6 are disabled):

ext0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
       
options=4a500b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP>

Now I manually pull vtnet0.10 from the above bridge:

[root@clweb5 /home/spork]# ifconfig bridge0 deletem vnet0.10

And we see connectivity drop for 5 seconds:

Aug 31 15:32:57 clweb5 kernel: vnet0.10: promiscuous mode disabled
Aug 31 15:32:57 clweb5 kernel: ext0: link state changed to DOWN
Aug 31 15:33:02 clweb5 kernel: ext0: Link is up, 1 Gbps Full Duplex, Requested
FEC: None, Negotiated FEC: None, Autoneg: True, Flow Control: None
Aug 31 15:33:02 clweb5 kernel: ext0: link state changed to UP

And we see why - removing the vtnet bridge member causes something(?) to put
all the flags I'd removed from ext0 back in place (txcsum, txcsum6, tso4,
tso6):

[root@clweb5 /home/spork]# ifconfig ext0
ext0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
       
options=4e503bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>

Again, this is me manually removing the interface from the bridge, not iocage.

Standard jails and iocage jails both call a "destroy" on the vtnet/epair
interface, so this isn't just an iocage issue.

Sorry this is so long... anyhow the questions again:

- Did the prior workarounds "work" and then stop working later?
- Did the behavior of bringing explicitly-removed flags back to an interface
when members are removed from a bridge change at some point?
- What was the fix in comment #28?

-- 
You are receiving this mail because:
You are the assignee for the bug.