Re: epair and vnet jail loose connection.
- In reply to: Johan Hendriks : "Re: epair and vnet jail loose connection."
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 13 Mar 2022 10:49:49 UTC
> On 13. Mar 2022, at 11:27, Johan Hendriks <joh.hendriks@gmail.com> wrote: > > > > Op zo 13 mrt. 2022 01:17 schreef Michael Gmelin <grembo@freebsd.org>: >> I also gave it another go (this time with multiple CPUs assigned to the vm), still works just fine - so I think we would need more details about the setup. >> >> Would it make sense to share our test setups, so Johan can try to reproduce with them? >> >> -m >> >>> On 13. Mar 2022, at 00:48, Kristof Provost <kp@freebsd.org> wrote: >>> >>> I’m still failing to reproduce. >>> >>> Is pf absolutely required to trigger the issue? Is haproxy (i.e. can you trigger it with iperf)? >>> Is the bridge strictly required? >>> >>> Kristof >>> >>> On 12 Mar 2022, at 8:18, Johan Hendriks wrote: >>> For me this minimal setup let me see the drop off of the network from the haproxy server. >>> >>> 2 jails, one with haproxy, one with nginx which is using the following html file to be served. >>> >>> <!DOCTYPE html> >>> <html> >>> <head> >>> <title>Page Title</title> >>> </head> >>> <body> >>> >>> <h1>My First Heading</h1> >>> <p>My first paragraph.</p> >>> >>> </body> >>> </html> >>> >>> From a remote machine i do a hey -h2 -n 10 -c 10 -z 300s https://wp.test.nl >>> Then a ping on the jailhost to the haproxy shows the following >>> >>> [ /] > ping 10.233.185.20 >>> PING 10.233.185.20 (10.233.185.20): 56 data bytes >>> 64 bytes from 10.233.185.20: icmp_seq=0 ttl=64 time=0.054 ms >>> 64 bytes from 10.233.185.20: icmp_seq=1 ttl=64 time=0.050 ms >>> 64 bytes from 10.233.185.20: icmp_seq=2 ttl=64 time=0.041 ms >>> <SNIP> >>> 64 bytes from 10.233.185.20: icmp_seq=169 ttl=64 time=0.050 ms >>> 64 bytes from 10.233.185.20: icmp_seq=170 ttl=64 time=0.154 ms >>> 64 bytes from 10.233.185.20: icmp_seq=171 ttl=64 time=0.054 ms >>> 64 bytes from 10.233.185.20: icmp_seq=172 ttl=64 time=0.039 ms >>> 64 bytes from 10.233.185.20: icmp_seq=173 ttl=64 time=0.160 ms >>> 64 bytes from 10.233.185.20: icmp_seq=174 ttl=64 time=0.045 ms >>> ^C >>> --- 10.233.185.20 ping statistics --- >>> 335 packets transmitted, 175 packets received, 47.8% packet loss >>> round-trip min/avg/max/stddev = 0.037/0.070/0.251/0.040 ms >>> >>> >>> ifconfig >>> vtnet0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 >>> options=4c00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6> >>> ether 56:16:e9:80:5e:41 >>> inet 87.233.191.146 netmask 0xfffffff0 broadcast 87.233.191.159 >>> inet 87.233.191.156 netmask 0xffffffff broadcast 87.233.191.156 >>> inet 87.233.191.155 netmask 0xffffffff broadcast 87.233.191.155 >>> inet 87.233.191.154 netmask 0xffffffff broadcast 87.233.191.154 >>> media: Ethernet autoselect (10Gbase-T <full-duplex>) >>> status: active >>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>> vtnet1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 >>> options=4c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6> >>> ether 56:16:2c:64:32:35 >>> media: Ethernet autoselect (10Gbase-T <full-duplex>) >>> status: active >>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 >>> options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> >>> inet6 ::1 prefixlen 128 >>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 >>> inet 127.0.0.1 netmask 0xff000000 >>> groups: lo >>> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> >>> bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 >>> ether 58:9c:fc:10:ff:82 >>> inet 10.233.185.1 netmask 0xffffff00 broadcast 10.233.185.255 >>> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 >>> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 >>> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 >>> member: epair20a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> >>> ifmaxaddr 0 port 7 priority 128 path cost 2000 >>> member: epair18a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> >>> ifmaxaddr 0 port 15 priority 128 path cost 2000 >>> groups: bridge >>> nd6 options=9<PERFORMNUD,IFDISABLED> >>> bridge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 >>> ether 58:9c:fc:10:d9:1a >>> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 >>> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 >>> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 >>> member: vtnet0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> >>> ifmaxaddr 0 port 1 priority 128 path cost 2000 >>> groups: bridge >>> nd6 options=9<PERFORMNUD,IFDISABLED> >>> pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33160 >>> groups: pflog >>> epair18a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 >>> description: jail_web01 >>> options=8<VLAN_MTU> >>> ether 02:77:ea:19:c7:0a >>> groups: epair >>> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) >>> status: active >>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>> epair20a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 >>> description: jail_haproxy >>> options=8<VLAN_MTU> >>> ether 02:9b:93:8c:59:0a >>> groups: epair >>> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) >>> status: active >>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>> >>> jail.conf >>> >>> # Global settings applied to all jails. >>> $domain = "test.nl"; >>> >>> exec.start = "/bin/sh /etc/rc"; >>> exec.stop = "/bin/sh /etc/rc.shutdown"; >>> exec.clean; >>> >>> mount.fstab = "/storage/jails/$name.fstab"; >>> >>> exec.system_user = "root"; >>> exec.jail_user = "root"; >>> mount.devfs; >>> sysvshm="new"; >>> sysvsem="new"; >>> allow.raw_sockets; >>> allow.set_hostname = 0; >>> allow.sysvipc; >>> enforce_statfs = "2"; >>> devfs_ruleset = "11"; >>> >>> path = "/storage/jails/${name}"; >>> host.hostname = "${name}.${domain}"; >>> >>> >>> # Networking >>> vnet; >>> vnet.interface = "vnet0"; >>> >>> # Commands to run on host before jail is created >>> exec.prestart = "ifconfig epair${ip} create up description jail_${name}"; >>> exec.prestart += "ifconfig epair${ip}a up"; >>> exec.prestart += "ifconfig bridge0 addm epair${ip}a up"; >>> exec.created = "ifconfig epair${ip}b name vnet0"; >>> >>> # Commands to run in jail after it is created >>> exec.start += "/bin/sh /etc/rc"; >>> >>> # commands to run in jail when jail is stopped >>> exec.stop = "/bin/sh /etc/rc.shutdown"; >>> >>> # Commands to run on host when jail is stopped >>> exec.poststop = "ifconfig bridge0 deletem epair${ip}a"; >>> exec.poststop += "ifconfig epair${ip}a destroy"; >>> persist; >>> >>> web01 { >>> $ip = 18; >>> } >>> >>> haproxy { >>> $ip = 20; >>> mount.fstab = ""; >>> path = "/storage/jails/${name}"; >>> } >>> >>> pf.conf >>> >>> ####################################################################### >>> ext_if="vtnet0" >>> table <bruteforcers> persist >>> table <torlist> persist >>> table <ssh-trusted> persist file "/usr/local/etc/pf/ssh-trusted" >>> table <custom-block> persist file "/usr/local/etc/pf/custom-block" >>> table <jailnetworks> { 10.233.185.0/24, 192.168.10.0/24 } >>> >>> icmp_types = "echoreq" >>> junk_ports="{ 135,137,138,139,445,68,67,3222,17500 }" >>> >>> # Log interface >>> set loginterface $ext_if >>> >>> # Set limits >>> set limit { states 40000, frags 20000, src-nodes 20000 } >>> >>> scrub on $ext_if all fragment reassemble no-df random-id >>> >>> # ---- Nat jails to the web >>> binat on $ext_if from 10.233.185.15/32 to !10.233.185.0/24 -> 87.233.191.156/32 # saltmaste >>> binat on $ext_if from 10.233.185.20/32 to !10.233.185.0/24 -> 87.233.191.155/32 # haproxy >>> binat on $ext_if from 10.233.185.22/32 to !10.233.185.0/24 -> 87.233.191.154/32 # web-comb >>> >>> nat on $ext_if from <jailnetworks> to any -> ($ext_if:0) >>> >>> # ---- First rule obligatory "Pass all on loopback" >>> pass quick on lo0 all >>> pass quick on bridge0 all >>> pass quick on bridge1 all >>> >>> # ---- Block TOR exit addresses >>> block quick proto { tcp, udp } from <torlist> to $ext_if >>> >>> # ---- Second rule "Block all in and pass all out" >>> block in log all >>> pass out all keep state >>> >>> # IPv6 pass in/out all IPv6 ICMP traffic >>> pass in quick proto icmp6 all >>> >>> # Pass all lo0 >>> set skip on lo0 >>> >>> ############### FIREWALL ############################################### >>> # ---- Block custom ip's and logs >>> block quick proto { tcp, udp } from <custom-block> to $ext_if >>> >>> # ---- Jail poorten >>> pass in quick on { $ext_if } proto tcp from any to 10.233.185.22 port { smtp 80 443 993 995 1956 } keep state >>> pass in quick on { $ext_if } proto tcp from any to 10.233.185.20 port { smtp 80 443 993 995 1956 } keep state >>> pass in quick on { $ext_if } proto tcp from any to 10.233.185.15 port { 4505 4506 } keep state >>> >>> # ---- Allow ICMP >>> pass in inet proto icmp all icmp-type $icmp_types keep state >>> pass out inet proto icmp all icmp-type $icmp_types keep state >>> >>> pass in quick on $ext_if inet proto tcp from any to $ext_if port { 80, 443 } flags S/SA keep state >>> pass in quick on $ext_if inet proto tcp from <ssh-trusted> to $ext_if port { 4505 4506 } flags S/SA keep state >>> block log quick from <bruteforcers> >>> pass quick proto tcp from <ssh-trusted> to $ext_if port ssh flags S/SA keep state >>> >>> This is as minimal i can get it. >>> >>> Hope this helps. >>> regards, >>> Johan Hendriks >>> >>> >>> Op za 12 mrt. 2022 om 02:10 schreef Kristof Provost <kp@freebsd.org>: >>>> On 11 Mar 2022, at 18:55, Michael Gmelin wrote: >>>> >> On 12. Mar 2022, at 01:21, Kristof Provost <kp@freebsd.org> wrote: >>>> >> >>>> >> On 11 Mar 2022, at 17:44, Johan Hendriks wrote: >>>> >>>> On 09/03/2022 20:55, Johan Hendriks wrote: >>>> >>>> The problem: >>>> >>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both running the same jails just to test the workings. >>>> >>>> >>>> >>>> The jails that are running are a salt master, a haproxy jail, 2 webservers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1. All the jails are connected to bridge0 and all the jails use vnet. >>>> >>>> >>>> >>>> I believe this worked on an older 14-HEAD machine, but i did not do a lot with it back then, and when i started testing again and after updating the OS i noticed that one of the varnish jails lost it's network connection after running for a few hours. I thought it was just something on HEAD so never really looked at it. But later on when i start using the jails again and testing a test wordpress site i noticed that with a simple load test my haproxy jail within one minute looses it's network connection. I see nothing in the logs, on the host and on the jail. >>>> >>>> From the jail i can not ping the other jails or the IP adres of the bridge. I can however ping the jails own IP adres. From the host i can also not ping the haproxy jail IP adres. If i start a tcpdump on the epaira interface from the haproxy jail i do see the packets arrive but not in the jail. >>>> >>>> >>>> >>>> I used ZFS to send all the jails to a 13-STABLE machine and copied over the jail.conf file as well as the pf.conf file and i saw the same behavior. >>>> >>>> >>>> >>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not see this happening. There i can stress test the machine for 10 minutes without a problem but on 14-HEAD and 13-STABLE within a minute the jail's network connection fails and only a restart of the jail brings it back online to exhibit the same behavior if i start a simple load test which it should handle nicely. >>>> >>>> >>>> >>>> One of the jail hosts is running under VMWARE and the other is running under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running under Ubuntu with KVM >>>> >>>> >>>> >>>> Thank you for your time. >>>> >>>> regards >>>> >>>> Johan >>>> >>>> >>>> >>> I did some bisecting and the latest commit that works on FreeBSD 13-Stable is 009a56b2e >>>> >>> Then the commit 2e0bee4c7 if_epair: implement fanout and above is showing the symptoms described above. >>>> >>> >>>> >> Interestingly I cannot reproduce stalls in simple epair setups. >>>> >> It would be useful if you could reduce the setup with the problem into a minimal configuration so we can figure out what other factors are involved. >>>> > >>>> > If there are clear instructions on how to reproduce, I’m happy to help experimenting (I’m relying heavily on epair at this point). >>>> > >>>> > @Kristof: Did you try on bare metal or on vms? >>>> > >>>> Both. >>>> >>>> Kristof > I also did do a new install, this time based on 13.1-PRERELEASE. > Copyd my haproxy en web01 jail to this machine and have the same problem. > > Could it be a sysctl i use? or boot/loader.conf setting. > > this is my /boot/loader.conf > # -- sysinstall generated deltas -- # > > autoboot_delay="2" #optional > > cryptodev_load="YES" > > vbe_max_resolution=1024x768 > > # disable hyperthreading > machdep.hyperthreading_allowed=0 > > # filemon > filemon_load="YES" > > # use gpt ids instead of gptids or disks idents > kern.geom.label.disk_ident.enable="0" > kern.geom.label.gpt.enable="1" > kern.geom.label.gptid.enable="0" > > # ZFS > zfs_load="YES" > > My /etc/sysctl.conf > > # $FreeBSD$ > # > # This file is read when going to multi-user and its contents piped thru > # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. > # > kern.timecounter.hardware=HPET > # accept queue > kern.ipc.soacceptqueue=4096 > > # PF vnet jail > net.link.bridge.pfil_member=0 > net.link.bridge.pfil_bridge=0 > net.inet.ip.forwarding=1 # (default 0) > net.inet.tcp.tso=0 # (default 1) > vfs.zfs.min_auto_ashift=12 > > I f you want i can give you full root access on this machine. > > I do use a machine outside of the host machine to do the hey command. The host file points to the alias which is binat for the haproxy jail. > > Thank you all for your time on this! > > regards > Johan Hendriks > Hi Johan, Two questions from one of my previous emails: 1. How is web01 configured (I created a full jail for it like haproxy, as it was unclear to me) 2. > devfs_ruleset = "11"; What is in devfs_ruleset 11? (it's not a standard one), I used "4" in my tests. Root access might help as well, if we continue to not be able to reproduce. Cheers Michael