From nobody Mon Mar 06 22:51:19 2023 X-Original-To: jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PVv2K4MZnz3wVXD for ; Mon, 6 Mar 2023 22:51:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PVv2K3cNyz3NDc for ; Mon, 6 Mar 2023 22:51:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1678143081; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MlSZ+/wV4XaoT0d4EABJqt4nbzciQWymzCsBun4LqZY=; b=wkM8gaJgm/qTssmLHCn5cIEq5Bqm0YkFwPl4nS1b4C1YTOKN+lL468PhzZEk/u8GWl7yn2 9VUjPRlk0+4UYB+DusmQO5uWsizB6TFJUd/Xn1q/njwelvzZXxhKFjo9nZmiM16IW9uSsL tu+b7ooD29+knYBMcdyb+d3wzD494RtB3s2GARdqayHOozZjYgLJY9xs5b7O2Ri/xRgQ8k D7VxDbO9md46aVBIwulXzar3ljtGRLyxRAxCV34Nuhq4TEUxcPUcqyJQZWBdomUovVD5/R gFc5z6508h6YWpWtinQmThQQ1CHAm+xgfdONLgPtVmH5LAbqX4QNRgeGvISzsQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1678143081; a=rsa-sha256; cv=none; b=IH+hFCrHYC7OTlfwK4xea2sKWaGuXWDunnfNIX37BoikF1WKUC9XKuDbjyhcFJMdqk4GXZ z45kghZmP4rCWqcDv/SGpcKERlQZlarTgDoACj8PCuWT3SbQJL+aYYgwLPt9cdc5LZUiFJ zAMHWKb0uzxPRe7Th+GMkBh3y27MYBFqXc1d/EBj9KwGKGHcBAuTYZsEB3auJzxIk5ppRJ QgbQS2P14b9zLDxNVmPijt/ul09Ih0G6frEQD0Cv1Arx+KQBitYYl8S2weJcs5xzUrRjf+ NTOUYNsdch3+7DacXyGAio+vQsdx93BTAz+u+wDBy7T3B2Pi+aqzmjAVSQqJEQ== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4PVv2K2hD9zKm3 for ; Mon, 6 Mar 2023 22:51:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 326MpLDH058259 for ; Mon, 6 Mar 2023 22:51:21 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 326MpLlm058258 for jail@FreeBSD.org; Mon, 6 Mar 2023 22:51:21 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: jail@FreeBSD.org Subject: [Bug 240106] VNET issue with ARP and routing sockets in jails Date: Mon, 06 Mar 2023 22:51:19 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: overwatch@lab.kyngin.net X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: jail@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D240106 kvs changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |overwatch@lab.kyngin.net --- Comment #26 from kvs --- Hello Everyone! I believe I have hit the same bug, though I believe my issue is specifically related to lagg/lacp. I can confirm this problem affects tap as well as ep= air interfaces on a bridge when attempting to send over a vlan interface that h= as a lagg parent. System Description: FreeBSD 13.1 w/ Chelsio T6225-SO-CR NIC, identified by = cc0 / cc1 (confirmed up and operational), host25 is the system name. Network is 10.20.20.0/24, gateway is 10.20.20.254 (mac: 02:11:22:33:44:55), host is assigned 10.20.20.5, epair0 is assigned to jail-10-20-20-6 (with matching I= P of 10.20.20.6 on epair0b). Switch is set to accept tagged frames only for vlan 2020. All mtu's 1500. When adding a vlan interface child of cc0 to the bridge, I do not have any trouble passing data over the lagg. host25# ifconfig cc0.2020 create up host25# ifconfig bridge2020 create up host25# ifconfig bridge2020 addm cc0.2020 host25# ifconfig bridge2020 addm epair0a host25# ifconfig bridge2020 inet 10.20.20.25/24 (pings from host -> gateway works fine) host25# ping 10.20.20.254 success! (pings from jail -> gateway also work) host25# jexec jail-10-20-20-6 sh jail-10-20-20-6# ping 10.20.20.254 success! (I now reset bridge2020 to use a lagg interface.) host25# ifconfig bridge2020 destroy host25# ifconfig cc0.2020 destroy host25# ifconfig lagg0 create laggproto lacp laggport cc0 laggport cc1 up host25# ifconfig lagg0.2020 create up host25# ifconfig bridge2020 create up host25# ifconfig bridge2020 addm lagg0.2020 addm epair0a host25# ifconfig bridge2020 inet 10.20.20.25/24 (pings from host -> gateway work fine) host25# ping 10.20.20.254 success! (pings from jail -> gateway timeout) host25# jexec jail-10-20-20-6 sh jail-10-20-20-6# ping 10.20.20.254 ping: sendto: Host is down (arp cache from jail appears to not include gateway mac) jail-10-20-20-6# arp -an ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at (incomplete) on epair0b expired [ethernet] (I assign mac statically.) jail-10-20-20-6# arp -s 10.20.20.254 02:11:22:33:44:55 jail-10-20-20-6# arp -an ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at 02:11:22:33:44:55 on epair0b permanent [ethernet] (attempt ping again after static arp assignment) jail-10-20-20-6# ping 10.20.20.254 success! What comes next is a reasonably big presumption on my part, so hopefully someone more educated on the topic kindly corrects me where I'm wrong. See= ing that the vlan interface of cc0.2020 works in the bridge when lagg0.2020 is removed/destroyed. I believe it's possible that the issue is related to arp responses being sent down one of the two lagg members and the host OS not b= eing aware of that. Although the reply does come inbound on one of the host OS interfaces, it doesn't propagate that down across the epair / tap. The VM/= Jail then never sees the arp reply, and keeps the arp as "(incomplete)" in it's cache. When using a single interface, or a lagg with only a single interfa= ce active, arp appears to work as expected. To help observe this, I did the following: 1) From host25, I watched epair0a, cc0, and cc1 using host25# tcpdump -e -vvv -XX -i [interface] 2) inside jail-10-20-20-6, I attempted to ping the gateway to generate the = arp traffic: ping -c 1 -t 1 -q 10.20.20.254 PING 10.20.20.254 (10.20.20.254): 56 data bytes --- 10.20.20.254 ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss 3) Results follow: # tcpdump -e -vvv -XX -i epair0a tcpdump: listening on epair0a, link-type EN10MB (Ethernet), capture size 26= 2144 bytes 01:43:54.768801 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001=20 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406=20 ................ 0x0020: 0000 0000 0000 0a14 14fe .......... 01:43:54.768936 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 56: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 42 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001=20 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406=20 ................ 0x0020: 0000 0000 0000 0a14 14fe 0000 0000 0000=20 ................ 0x0030: 0000 0000 0000 0000 ........ 01:43:54.768969 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 46 0x0000: ffff ffff ffff 0207 f080 de0b 0806 0001=20 ................ 0x0010: 0800 0604 0001 0207 f080 de0b 0a14 1406=20 ................ 0x0020: 0000 0000 0000 0a14 14fe 0000 0000 0000=20 ................ 0x0030: 0000 0000 0000 0000 0000 0000 .........= ... # tcpdump -e -vvv -XX -i cc0 tcpdump: listening on cc0, link-type EN10MB (Ethernet), capture size 262144 bytes 01:43:54.768822 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.= 1Q (0x8100), length 46: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 = (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe .............. 01:43:54.769126 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A= RP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 = (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01:43:54.769171 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A= RP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 = (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 01:43:54.769221 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype 802.1Q (0x8100), length 64: vlan 2020, p 0, ethertype A= RP, Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 = (oui Unknown), length 46 0x0000: 0207 f080 de0b 0211 2233 4455 8100 07e4 ........"3DU.... 0x0010: 0806 0001 0800 0604 0002 0211 2233 4455 ............"3DU 0x0020: 0a14 14fe 0207 f080 de0b 0a14 1406 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ # tcpdump -e -vvv -XX -i cc1 tcpdump: listening on cc1, link-type EN10MB (Ethernet), capture size 262144 bytes 01:43:54.768876 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.= 1Q (0x8100), length 60: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 = (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 42 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 ............ 01:43:54.768965 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype 802.= 1Q (0x8100), length 64: vlan 2020, p 0, ethertype ARP, Ethernet (len 6), IPv4 = (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 46 0x0000: ffff ffff ffff 0207 f080 de0b 8100 07e4 ................ 0x0010: 0806 0001 0800 0604 0001 0207 f080 de0b ................ 0x0020: 0a14 1406 0000 0000 0000 0a14 14fe 0000 ................ 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ Apparently 1 arp request is sent over cc0, and 2 over cc1, all 3 replies co= me back over cc0. None of them appear to enter epair0a. I've not had any luck changing lagg hashes at this stage to try to force requests down one of the= two lagg members, so instead I downed one of the interfaces in the lagg. (bridge2020 is still up with epair0a and lagg0.2020 (lagg0 contains cc0+cc1 both up)) jail-10-20-20-6# ping 10.20.20.254 ping: sendto: Host is down host25# ifconfig cc1 down (confirm arp cache is empty in jail) jail-10-20-20-6# arp -da jail-10-20-20-6# ping 10.20.20.254 success! (using tcpdump, epair0a now sees the arp replies as well (I excluded the tcpdump for cc0 here because it's largely identical)) # tcpdump -e -vvv -XX -i epair0a 15:23:10.623560 02:07:f0:80:de:0b (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.20.254 tell 10.20.20.6, length 28 0x0000: 0001 0800 0604 0001 0207 f080 de0b 0a14 ................ 0x0010: 1406 0000 0000 0000 0a14 14fe ............ 15:23:10.623916 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623924 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623926 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.20.254 is-at 02:11:22:33:44:55 (oui Unknown), length 46 0x0000: 0001 0800 0604 0002 0211 2233 4455 0a14 .........."3DU.. 0x0010: 14fe 0207 f080 de0b 0a14 1406 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 15:23:10.623943 02:07:f0:80:de:0b (oui Unknown) > 02:11:22:33:44:55 (oui Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 56841, offset 0, flags [none], proto ICMP (1), length 84) 10.20.20.6 > 10.20.20.254: ICMP echo request, id 22927, seq 0, leng= th 64 0x0000: 4500 0054 de09 0000 4001 5f74 0a14 1406 E..T....@._t.... 0x0010: 0a14 14fe 0800 8750 598f 0000 0006 2ec0 .......PY....... 0x0020: 15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 15:23:10.624147 02:11:22:33:44:55 (oui Unknown) > 02:07:f0:80:de:0b (oui Unknown), ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54016, offset 0, flags [none], proto ICMP (1), length 84) 10.20.20.254 > 10.20.20.6: ICMP echo reply, id 22927, seq 0, length= 64 0x0000: 4500 0054 d300 0000 4001 6a7d 0a14 14fe E..T....@.j}.... 0x0010: 0a14 1406 0000 8f50 598f 0000 0006 2ec0 .......PY....... 0x0020: 15c1 e795 0809 0a0b 0c0d 0e0f 1011 1213 ................ 0x0030: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123 0x0050: 3435 3637 4567 (arp cache seems valid as well) jail-10-20-20-6# arp -na ? (10.20.20.6) at 02:07:f0:80:de:0b on epair0b permanent [ethernet] ? (10.20.20.254) at 02:11:22:33:44:55 on epair0b expires in 1085 seconds [ethernet] Additional thoughts: 1) With lagg0, cc0, and cc1 up, I created a second jail on host25 using 10.20.20.7 (epair1). I add epair1a to bridge2020 (now including epair0a, epair1a and lagg0.2020). When I attempt to ping from jail-10-20-20-6 to .254 I get a timeout as previously experienced. Pinging from .6 to .7 appears to work without any trouble, if lagg0 has any cc0/1 members up or down. This was expected, as packets should never trave= rse lagg0.2020, but I did want to test/confirm. 2) I did run some ping tests with untagged lagg0 in the bridge, and it does appear it's working without trouble. I removed lagg0.2020 from bridge2020, then added lagg0 to bridge2020, and set the switch ports as untagged in the switch. The packets appear to move without trouble even with both cc0+cc1 = up.=20 I need to further test this to be conclusive, but this felt less important = to perform at this time as it doesn't solve the requirement I need of tagged ports. 3) I have a few bhyve vm's that I've added as tests, tap0, tap1, etc to the bridge2020. The results seem to be largely consistent with jails. You cou= ld replace jail-10-20-20-6, with vm-10-20-20-11 (tested freebsd / openbsd / windows) for instance, and these same results appear. Packets fail when originating from tap/vnet and traversing lagg0.2020. (again, lagg0/lacp is up, includes cc0+cc1, bridge2020 includes lagg0.2020, tap0, and epair0a devices) host25# ping 10.20.20.254 success! vm-10-20-20-11# arp -da (attempt traverse lagg0.2020) vm-10-20-20-11# ping 10.20.20.254 ping: sendto: Host is down (try tap0 -> epair0) vm-10-20-20-11# ping 10.20.20.6 success! (try tests again with lagg0 member cc1 down) host25# cc1 down (tap0 -> lagg0.2020 -> 10.20.20.254) vm-10-20-20-11# ping 10.20.20.254 success! (again tap0 -> epair0, works as expected) vm-10-20-20-11# ping 10.20.20.6 success! (turn cc1 back up, wait about 10 seconds for both laggports to be distribut= ing) host25# cc1 up vm-10-20-20-11# arp -da vm-10-20-20-11# ping 10.20.20.254 ping: sendto: Host is down (again, only lagg is preventing arp, tap <-> epair in bridge still works fi= ne) vm-10-20-20-11# ping 10.20.20.6 success! jail-10-20-20-6# ping 10.20.20.11 success! Conclusion: When bridging a vnet/tap interface with a lagg.vlan interface (= vlan interface with lagg [laggproto lacp] parent) arp replies do not enter the vnet/tap interface on the bridge when *both* lagg members are up. By downi= ng one of the two interfaces in the lagg group, arp replies enter the vnet/tap interface as expected. Final notes: I've not included it in this post, but I've attempted to remove all the hardware offloading features from the interfaces lagg0/lagg0.2020/cc0/cc1 as well as toggled lagg0 lagghash, toggled sysctls net.link.lagg.* and net.link.bridge.*, as well as upgraded to 13-STABLE. No luck moving data o= ver the lagg until I down one of the two lagg0 interfaces. For brevity, I used= the command 'ping host-ip' in the examples above, and only displayed a simple response of success/fail. In testing I mostly performed pings for reasonab= ly long periods (ex: -c 10 -t 2), to confirm the above examples. I'd be happy to help test further if anyone has any suggestions. Thank you! -kvs --=20 You are receiving this mail because: You are the assignee for the bug.=