[Bug 240106] VNET issue with ARP and routing sockets in jails
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 06 Mar 2023 22:51:19 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240106 kvs <overwatch@lab.kyngin.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |overwatch@lab.kyngin.net --- Comment #26 from kvs <overwatch@lab.kyngin.net> --- Hello Everyone! I believe I have hit the same bug, though I believe my issue is specifically related to lagg/lacp. I can confirm this problem affects tap as well as epair interfaces on a bridge when attempting to send over a vlan interface that has a lagg parent. System Description: FreeBSD 13.1 w/ Chelsio T6225-SO-CR NIC, identified by cc0 / cc1 (confirmed up and operational), host25 is the system name. Network is 10.20.20.0/24, gateway is 10.20.20.254 (mac: 02:11:22:33:44:55), host is assigned 10.20.20.5, epair0 is assigned to jail-10-20-20-6 (with matching IP of 10.20.20.6 on epair0b). Switch is set to accept tagged frames only for vlan 2020. All mtu's 1500. When adding a vlan interface child of cc0 to the bridge, I do not have any trouble passing data over the lagg. host25# ifconfig cc0.2020 create up host25# ifconfig bridge2020 create up host25# ifconfig bridge2020 addm cc0.2020 host25# ifconfig bridge2020 addm epair0a host25# ifconfig bridge2020 inet 10.20.20.25/24 (pings from host -> gateway works fine) host25# ping 10.20.20.254 success! (pings from jail -> gateway also work) host25# jexec jail-10-20-20-6 sh jail-10-20-20-6# ping 10.20.20.254 success! (I now reset bridge2020 to use a lagg interface.) host25# ifconfig bridge2020 destroy host25# ifconfig cc0.2020 destroy host25# ifconfig lagg0 create laggproto lacp laggport cc0 laggport cc1 up host25# ifconfig lagg0.2020 create up host25# ifconfig bridge2020 create up host25# ifconfig bridge2020 addm lagg0.2020 addm epair0a host25# ifconfig bridge2020 inet 10.20.20.25/24 (pings from host -> gateway work fine) host25# ping 10.20.20.254 success! (pings from jail -> gateway timeout) host25# jexec jail-10-20-20-6 sh jail-10-20-20-6# ping 10.20.20.254 ping: sendto: Host is down (arp cache from jail appears to not include gateway mac) jail-10-20-20-6# arp -an ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at (incomplete) on epair0b expired [ethernet] (I assign mac statically.) jail-10-20-20-6# arp -s 10.20.20.254 02:11:22:33:44:55 jail-10-20-20-6# arp -an ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at 02:11:22:33:44:55 on epair0b permanent [ethernet] (attempt ping again after static arp assignment) jail-10-20-20-6# ping 10.20.20.254 success! What comes next is a reasonably big presumption on my part, so hopefully someone more educated on the topic kindly corrects me where I'm wrong. Seeing that the vlan interface of cc0.2020 works in the bridge when lagg0.2020 is removed/destroyed. I believe it's possible that the issue is related to arp responses being sent down one of the two lagg members and the host OS not being aware of that. Although the reply does come inbound on one of the host OS interfaces, it doesn't propagate that down across the epair / tap. The VM/Jail then never sees the arp reply, and keeps the arp as "(incomplete)" in it's cache. When using a single interface, or a lagg with only a single interface active, arp appears to work as expected. To help observe this, I did the following: 1) From host25, I watched epair0a, cc0, and cc1 using host25# tcpdump -e -vvv -XX -i [interface] 2) inside jail-10-20-20-6, I attempted to ping the gateway to generate the arp traffic: ping -c 1 -t 1 -q 10.20.20.254 PING 10.20.20.254 (10.20.20.254): 56 data bytes --- 10.20.20.254 ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss 3) Results follow: # tcpdump -e -vvv -XX -i epair0a tcpdump: listening on epair0a, link-type EN10MB (Ethernet), capture size 262144 bytes 01:43:54.768801 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406 ................ 0x0020: 0000 0000 0000 0a14 14fe .......... 01:43:54.768936 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 56: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 42 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406 ................ 0x0020: 0000 0000 0000 0a14 14fe 0000 0000 0000 ................ 0x0030: 0000 0000 0000 0000 ........ 01:43:54.768969 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 46 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406 ................ 0x0020: 0000 0000 0000 0a14 14fe 0000 0000 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 ............ # tcpdump -e -vvv -XX -i cc0 tcpdump: listening on cc0, link-type EN10MB (Ethernet), capture size 262144 bytes 01:43:54.768822 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe .............. 01:43:54.769126 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01:43:54.769171 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01:43:54.769221 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ # tcpdump -e -vvv -XX -i cc1 tcpdump: listening on cc1, link-type EN10MB (Ethernet), capture size 262144 bytes 01:43:54.768876 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 60: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 42 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 ............ 01:43:54.768965 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 46 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ Apparently 1 arp request is sent over cc0, and 2 over cc1, all 3 replies come back over cc0. None of them appear to enter epair0a. I've not had any luck changing lagg hashes at this stage to try to force requests down one of the two lagg members, so instead I downed one of the interfaces in the lagg. (bridge2020 is still up with epair0a and lagg0.2020 (lagg0 contains cc0+cc1 both up)) jail-10-20-20-6# ping 10.20.20.254 ping: sendto: Host is down host25# ifconfig cc1 down (confirm arp cache is empty in jail) jail-10-20-20-6# arp -da jail-10-20-20-6# ping 10.20.20.254 success! (using tcpdump, epair0a now sees the arp replies as well (I excluded the tcpdump for cc0 here because it's largely identical)) # tcpdump -e -vvv -XX -i epair0a 15:23:10.623560 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: 0001 0800 0604 0001 0207 f080 de0b 0a14 ................ 0x0010: 1406 0000 0000 0000 0a14 14fe ............ 15:23:10.623916 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623924 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623926 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623943 02:07:f0:80:de:0b (oui Unknown) > 02:11:22:33:44:55 (oui Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56841, offset 0, flags [none], proto ICMP (1), length 84) 10.20.20.6 > 10.20.20.254: ICMP echo request, id 22927, seq 0, length 64 0x0000: 4500 0054 de09 0000 4001 5f74 0a14 1406 E..T....@._t.... 0x0010: 0a14 14fe 0800 8750 598f 0000 0006 2ec0 .......PY....... 0x0020: 15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 15:23:10.624147 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54016, offset 0, flags [none], proto ICMP (1), length 84) 10.20.20.254 > 10.20.20.6: ICMP echo reply, id 22927, seq 0, length 64 0x0000: 4500 0054 d300 0000 4001 6a7d 0a14 14fe E..T....@.j}.... 0x0010: 0a14 1406 0000 8f50 598f 0000 0006 2ec0 .......PY....... 0x0020: 15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 (arp cache seems valid as well) jail-10-20-20-6# arp -na ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at 02:11:22:33:44:55 on epair0b expires in 1085 seconds [ethernet] Additional thoughts: 1) With lagg0, cc0, and cc1 up, I created a second jail on host25 using 10.20.20.7 (epair1). I add epair1a to bridge2020 (now including epair0a, epair1a and lagg0.2020). When I attempt to ping from jail-10-20-20-6 to .254 I get a timeout as previously experienced. Pinging from .6 to .7 appears to work without any trouble, if lagg0 has any cc0/1 members up or down. This was expected, as packets should never traverse lagg0.2020, but I did want to test/confirm. 2) I did run some ping tests with untagged lagg0 in the bridge, and it does appear it's working without trouble. I removed lagg0.2020 from bridge2020, then added lagg0 to bridge2020, and set the switch ports as untagged in the switch. The packets appear to move without trouble even with both cc0+cc1 up. I need to further test this to be conclusive, but this felt less important to perform at this time as it doesn't solve the requirement I need of tagged ports. 3) I have a few bhyve vm's that I've added as tests, tap0, tap1, etc to the bridge2020. The results seem to be largely consistent with jails. You could replace jail-10-20-20-6, with vm-10-20-20-11 (tested freebsd / openbsd / windows) for instance, and these same results appear. Packets fail when originating from tap/vnet and traversing lagg0.2020. (again, lagg0/lacp is up, includes cc0+cc1, bridge2020 includes lagg0.2020, tap0, and epair0a devices) host25# ping 10.20.20.254 success! vm-10-20-20-11# arp -da (attempt traverse lagg0.2020) vm-10-20-20-11# ping 10.20.20.254 ping: sendto: Host is down (try tap0 -> epair0) vm-10-20-20-11# ping 10.20.20.6 success! (try tests again with lagg0 member cc1 down) host25# cc1 down (tap0 -> lagg0.2020 -> 10.20.20.254) vm-10-20-20-11# ping 10.20.20.254 success! (again tap0 -> epair0, works as expected) vm-10-20-20-11# ping 10.20.20.6 success! (turn cc1 back up, wait about 10 seconds for both laggports to be distributing) host25# cc1 up vm-10-20-20-11# arp -da vm-10-20-20-11# ping 10.20.20.254 ping: sendto: Host is down (again, only lagg is preventing arp, tap <-> epair in bridge still works fine) vm-10-20-20-11# ping 10.20.20.6 success! jail-10-20-20-6# ping 10.20.20.11 success! Conclusion: When bridging a vnet/tap interface with a lagg.vlan interface (vlan interface with lagg [laggproto lacp] parent) arp replies do not enter the vnet/tap interface on the bridge when *both* lagg members are up. By downing one of the two interfaces in the lagg group, arp replies enter the vnet/tap interface as expected. Final notes: I've not included it in this post, but I've attempted to remove all the hardware offloading features from the interfaces lagg0/lagg0.2020/cc0/cc1 as well as toggled lagg0 lagghash, toggled sysctls net.link.lagg.* and net.link.bridge.*, as well as upgraded to 13-STABLE. No luck moving data over the lagg until I down one of the two lagg0 interfaces. For brevity, I used the command 'ping host-ip' in the examples above, and only displayed a simple response of success/fail. In testing I mostly performed pings for reasonably long periods (ex: -c 10 -t 2), to confirm the above examples. I'd be happy to help test further if anyone has any suggestions. Thank you! -kvs -- You are receiving this mail because: You are the assignee for the bug.