Re: epair and vnet jail loose connection.
- Reply: Bjoern A. Zeeb: "Re: epair and vnet jail loose connection."
- In reply to: Johan Hendriks : "Re: epair and vnet jail loose connection."
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 13 Mar 2022 16:33:28 UTC
On Sun, 13 Mar 2022 14:32:50 +0100 Johan Hendriks <joh.hendriks@gmail.com> wrote: > On 13/03/2022 14:06, Patrick M. Hausen wrote: > > Hi all, > > > > i was a bit puzzled by Michael using bhyve trying to reproduce. > > Up until now I thought bhyve uses tap and not epair? > > > > Anyway ... > > > >> Am 13.03.2022 um 14:01 schrieb Johan Hendriks > >> <joh.hendriks@gmail.com>: I have no idea why it does not work on > >> my setup, which is nothing out of the ordinary i think, basic full > >> jails connected to a bridge interface and one of them exposed to > >> the world wide web using pf binat. > > What we do is full exposed VNET jails connected to the bridge > > on the external interface of the host. ipfw kernel module loaded > > but not used in this case, i.e. only the "default to accept" rule > > active in the jails. > > > > I will probably downgrade the production host from 13.1-PRERELEASE > > to 13.0-pX tomorrow and see if that changes anything. > > > > Kind regards, > > Patrick > Downgrading to 13.0-p7 worked for me, it even works on 13.0-STABLE > till this commit 18 days ago. > https://freshbsd.org/freebsd/src/commit/2e0bee4c7f8176e0f8396c9389275745bac1e263 > > After that commit my setup stops working. > @all Johan gave me access to a test system where I could see the problem in action. There's nothing wrong with his config in respect to the issue at hand. I tried a few times more on my smaller test setup and I could reproduce the issue there now as well (with ncpu=2). I created a reduced test case that triggers the issue every time. It's assumed to be run on a dedicated vm or host. It doesn't require pf, bridges, tuning sysctl.conf, or any other special considerations. /etc/rc.conf is very basic/vanilla: hostname="johan" ifconfig_vtnet0="10.1.1.16/24" defaultrouter="10.1.1.1" gateway_enable="YES" sshd_enable="YES" dumpdev="NO" zfs_enable="YES" sendmail_enable="NONE" Script to test/reproduce: #!/bin/sh export PATH=/usr/local/bin:"$PATH" jname="tj" ename="epair_$jname" set -e echo "====> Install packages" pkg install -y haproxy hey echo "====> Remove some leftovers" ( killall hey || true jail -r "$jname" || true ifconfig "$ename" destroy || true ) 2>/dev/null sleep 1 echo "====> Create interfaces" intf=$(ifconfig epair create) jintf=$(echo "$intf" | sed "s|a$|b|") ifconfig "$intf" name "$ename" ifconfig "$ename" 10.233.185.1/24 echo "====> Create and start jail" jail -c vnet name="$jname" persist path=/ \ host.hostname="$jname" vnet.interface="$jintf" jexec "$jname" ifconfig lo0 127.0.0.1/8 jexec "$jname" ifconfig "$jintf" 10.233.185.2/24 up jexec "$jname" route add default 10.233.185.1 cat >/tmp/haproxy.conf<<EOH global daemon user www group www defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms backend default mode http frontend default bind 10.233.185.2:80 #443 alpn h2,http/1.1 ssl crt /usr/local/etc/haproxy.pem use_backend default EOH jexec "$jname" haproxy -f /tmp/haproxy.conf echo "====> Start hey instances" hey -h2 -n 10 -c 10 -z 300s http://10.233.185.2& hey -h2 -n 10 -c 10 -z 300s http://10.233.185.2& hey -h2 -n 10 -c 10 -z 300s http://10.233.185.2& echo "====> Ping jail" ping 10.233.185.2 # EOF This script can be called multiple times in a row (it tears down what it created in previous runs). Now, testing with this script, I get: ====> Install packages Updating FreeBSD repository catalogue... FreeBSD repository is up to date. All repositories are up to date. Checking integrity... done (0 conflicting) The most recent versions of packages are already installed ====> Remove some leftovers tj: removed ====> Create interfaces epair_tj ====> Create and start jail add net default: gateway 10.233.185.1 ====> Start hey instances ====> Ping jail PING 10.233.185.2 (10.233.185.2): 56 data bytes 64 bytes from 10.233.185.2: icmp_seq=0 ttl=64 time=0.076 ms 64 bytes from 10.233.185.2: icmp_seq=1 ttl=64 time=0.138 ms 64 bytes from 10.233.185.2: icmp_seq=2 ttl=64 time=0.086 ms 64 bytes from 10.233.185.2: icmp_seq=3 ttl=64 time=0.158 ms 64 bytes from 10.233.185.2: icmp_seq=4 ttl=64 time=0.081 ms 64 bytes from 10.233.185.2: icmp_seq=5 ttl=64 time=0.093 ms At which point it gets stuck. The exact moment when this happens differs between runs, but it happens every time on my test host and always within a couple of seconds. It's important to point out that this only happens with kern.ncpu>1. With kern.ncpu==1 nothing gets stuck. This perfectly fits into the picture, since, as pointed out by Johan, the first commit that is affected[0] is about multicore support. Cheers Michael [0] https://cgit.freebsd.org/src/commit/?id=24f0bfbad57b9c3cb9b543a60b2ba00e4812c286 -- Michael Gmelin