Re: Performance issues with vnet jails + epair + bridge

From: Doug Rabson <dfr_at_rabson.org>
Date: Sun, 15 Sep 2024 17:01:07 UTC
I just did a throughput test with iperf3 client on a FreeBSD 14.1 host with
an intel 10GB nic connecting to an iperf3 server running in a vnet jail on
a truenas host (13.something) also with an intel 10GB nic and I get full
10GB throughput in this setup. In the past, I had to disable LRO on the
truenas host for this to work properly.

Doug.



On Sat, 14 Sept 2024 at 11:25, Sad Clouds <cryintothebluesky@gmail.com>
wrote:

> On Sat, 14 Sep 2024 10:45:03 +0800
> Zhenlei Huang <zlei@FreeBSD.org> wrote:
>
> > The overhead of vnet jail should be neglectable, compared to legacy jail
> > or no-jail. Bare in mind when VIMAGE option is enabled, there is a
> default
> > vnet 0. It is not visible via jls and can not be destroyed. So when you
> see
> > bottlenecks, for example this case, it is mostly caused by other
> components
> > such as if_epair, but not the vnet jail itself.
>
> Perhaps this needs a correction - the vnet itself may be OK, but due to
> a single physical NIC on this appliance, I cannot use vnet jails
> without virtualised devices like if_epair(4) and if_bridge(4). I think
> there may be other scalability bottlenecks.
>
> I have a similar setup on Solaris
>
> Here devel is a Solaris zone with exclusive IP configuration, which I
> think may be similar to FreeBSD vnet. It has a virtual NIC devel/net0
> which operates over the physical NIC also called net0 in the global
> zone:
>
> $ dladm
> LINK                CLASS     MTU    STATE    OVER
> net0                phys      1500   up       --
> net1                phys      1500   up       --
> net2                phys      1500   up       --
> net3                phys      1500   up       --
> pkgsrc/net0         vnic      1500   up       net0
> devel/net0          vnic      1500   up       net0
>
> If I run TCP bulk data benchmark with 64 concurrent threads, 32
> threads with server process in the global zone and 32 threads with
> client process in the devel zone, then the system evenly spreads the
> load across all CPU cores and none of them are sitting idle:
>
> $ mpstat -A core 1
>  COR minf mjf xcal  intr ithr  csw icsw migr smtx  srw  syscl  usr sys  st
> idl sze
>    0    0   0 2262  2561    4 4744 2085  209 7271    0 747842  272 528
>  0   0   8
>    1    0   0 3187  4209    2 9102 3768  514 10605   0 597012  221 579
>  0   0   8
>    2    0   0 2091  3251    7 6768 2884  307 9557    0 658124  244 556
>  0   0   8
>    3    0   0 1745  1786   16 3494 1520  176 8847    0 746373  273 527
>  0   0   8
>    4    0   0 2797  2767    3 5908 2414  371 7849    0 692873  253 547
>  0   0   8
>    5    0   0 2782  2359    5 4857 2012  324 9431    0 684840  251 549
>  0   0   8
>    6    0   0 4324  4133    0 9138 3592  538 12525   0 516342  191 609
>  0   0   8
>    7    0   0 2180  3249    0 6960 2926  321 8825    0 697861  257 543
>  0   0   8
>
> With FreeBSD I tried "options RSS" and increasing "net.isr.maxthreads"
> however this resulted in some really flaky kernel behavior. So I'm
> thinking that if_epair(4) may be OK for some low-bandwidth use cases,
> i.e. testing firewall rules, etc, but not suitable for things like
> file/object storage servers, etc.
>
>