terrible if_vmx / vmxnet3 rx performance with lro (post iflib)
Patrick Kelsey
pkelsey at freebsd.org
Tue Feb 25 04:41:09 UTC 2020
On Thu, Feb 20, 2020 at 4:58 PM Josh Paetzel <jpaetzel at freebsd.org> wrote:
>
>
> On Wed, Feb 19, 2020, at 7:17 AM, Andriy Gapon wrote:
> > On 18/02/2020 16:09, Andriy Gapon wrote:
> > > My general experience with post-iflib vmxnet3 is that vmxnet3 has some
> > > peculiarities that result in a certain "impedance mismatch" with iflib.
> > > Although we now have a bit less code and it is a bit more regular,
> there are a
> > > few significant (for us, at least) problems:
> > > - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126
> > > - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240608
> >
> > By the way, we (Panzura) use these changes to fix or work around the
> above two
> > problems: https://people.freebsd.org/~avg/iflib-vmx.pz.diff
> >
> > Questions / comments are welcome.
> > Especially from people who worked on iflib.
> >
> > > - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243392
> > > - the problem described above
> > > - a couple of issues that we already fixed or worked around
> > >
> > > We are contemplating locally reverting to the pre-iflib vmxnet3 and we
> are
> > > wondering if the conversion was really worth it in general.
> >
> >
> > --
> > Andriy Gapon
> > _______________________________________________
> > freebsd-net at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> >
>
> I'd like to follow this up just to make it 100% clear. The problem is a
> ~4x regression in RX performance. It affects stock FreeBSD, including
> 12.1-RELEASE.
>
> In my 40Gbps connected lab single thread iperf receive went from 9Gbps to
> 2.5Gbps.
>
> If this can't be fixed or looked at I'd heavily suggest looking at
> reverting "iflib"ing change in stock FreeBSD.
>
>
Consider these datapoints I collected this evening:
Hypervisor: ESXi 6.7.0 Build 8169922
Hardware: Xeon E5-1650 v3 @ 3.50GHz (6 physical cores, HT disabled)
iperf3 client: a VM on the same vswitch as the VM under test, running
Ubuntu 18.04.3 LTS with 2 vCPUs, 4GB RAM, and a VMXNET3 interface used only
for traffic to the VM under test (this VMXNET3 has checksum offload,
TSO/GSO, and LRO/GRO enabled)
iperf3 server: running on the VM under test, either a 12.0-RELEASE VM (this
is before the vmx iflib conversion), or a 12.1-RELEASE VM (this is after
the vmx iflib conversion) with r356703 applied (the recent TSO bug fix).
Both VMs have 3 vCPUs, but the vmx interface only uses 1 tx and 1 rx queue,
as hw.pci.honor_msi_blacklist is at its default of 0, so MSI is used.
Test 1: 12.0-RELEASE, single TCP stream receive, standard mtu, TSO enabled,
LRO disabled
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=60039b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -c <12.0 VM IP> -p 1234
Connecting to host <12.0 VM IP>, port 1234
[ 4] local <Ubuntu VM IP> port 44664 connected to <12.0 VM IP> port 1234
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.11 GBytes 9.52 Gbits/sec 1144 529 KBytes
[ 4] 1.00-2.00 sec 1.09 GBytes 9.40 Gbits/sec 1272 369 KBytes
[ 4] 2.00-3.00 sec 1.11 GBytes 9.51 Gbits/sec 1249 344 KBytes
[ 4] 3.00-4.00 sec 1.06 GBytes 9.12 Gbits/sec 1973 369 KBytes
[ 4] 4.00-5.00 sec 1.11 GBytes 9.50 Gbits/sec 1860 370 KBytes
[ 4] 5.00-6.00 sec 1.08 GBytes 9.28 Gbits/sec 1342 396 KBytes
[ 4] 6.00-7.00 sec 1.09 GBytes 9.38 Gbits/sec 1278 563 KBytes
[ 4] 7.00-8.00 sec 1.05 GBytes 8.99 Gbits/sec 1226 372 KBytes
[ 4] 8.00-9.00 sec 1.03 GBytes 8.87 Gbits/sec 1145 400 KBytes
[ 4] 9.00-10.00 sec 1.08 GBytes 9.28 Gbits/sec 1317 354 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 13806
sender
[ 4] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec
receiver
Test 2: 12.0-RELEASE, single TCP stream receive, standard mtu, TSO enabled,
LRO enabled
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=60079b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -c <12.0 VM IP> -p 1234
Connecting to host <12.0 VM IP>, port 1234
[ 4] local <Ubuntu VM IP> port 44714 connected to <12.0 VM IP> port 1234
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 3.48 GBytes 29.9 Gbits/sec 0 887 KBytes
[ 4] 1.00-2.00 sec 1.93 GBytes 16.6 Gbits/sec 0 994 KBytes
[ 4] 2.00-3.00 sec 2.03 GBytes 17.5 Gbits/sec 0 1.10 MBytes
[ 4] 3.00-4.00 sec 1.99 GBytes 17.1 Gbits/sec 0 1.10 MBytes
[ 4] 4.00-5.00 sec 2.00 GBytes 17.1 Gbits/sec 0 1.10 MBytes
[ 4] 5.00-6.00 sec 1.93 GBytes 16.6 Gbits/sec 0 1.10 MBytes
[ 4] 6.00-7.00 sec 2.04 GBytes 17.5 Gbits/sec 0 1.10 MBytes
[ 4] 7.00-8.00 sec 2.01 GBytes 17.3 Gbits/sec 0 1.10 MBytes
[ 4] 8.00-9.00 sec 1.97 GBytes 16.9 Gbits/sec 0 1.10 MBytes
[ 4] 9.00-10.00 sec 1.98 GBytes 17.0 Gbits/sec 0 1.10 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 21.4 GBytes 18.3 Gbits/sec 0 sender
[ 4] 0.00-10.00 sec 21.4 GBytes 18.3 Gbits/sec
receiver
Test 3: 12.0-RELEASE, single TCP stream receive, standard mtu, TSO enabled,
LRO disabled (LRO disabled and test run after Test 2 above)
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=60039b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -c <12.0 VM IP> -p 1234
Connecting to host <12.0 VM IP>, port 1234
[ 4] local <Ubuntu VM IP> port 44718 connected to <12.0 VM IP> port 1234
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.14 GBytes 9.76 Gbits/sec 1871 338 KBytes
[ 4] 1.00-2.00 sec 483 MBytes 4.05 Gbits/sec 1307 1.41 KBytes
[ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
[ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
[ 4] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
[ 4] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.61 GBytes 1.38 Gbits/sec 3181
sender
[ 4] 0.00-10.00 sec 1.60 GBytes 1.38 Gbits/sec
receiver
Test 4: 12.0-RELEASE, single TCP stream transmit, standard mtu, TSO
enabled, LRO enabled
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=60079b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -R -c <12.0 VM IP> -p 1234
Connecting to host <12.0 VM IP>, port 1234
Reverse mode, remote host <12.0 VM IP> is sending
[ 4] local <Ubuntu VM IP> port 44726 connected to <12.0 VM IP> port 1234
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 4.28 GBytes 36.8 Gbits/sec
[ 4] 1.00-2.00 sec 3.31 GBytes 28.4 Gbits/sec
[ 4] 2.00-3.00 sec 3.85 GBytes 33.1 Gbits/sec
[ 4] 3.00-4.00 sec 4.24 GBytes 36.5 Gbits/sec
[ 4] 4.00-5.00 sec 3.16 GBytes 27.1 Gbits/sec
[ 4] 5.00-6.00 sec 3.54 GBytes 30.4 Gbits/sec
[ 4] 6.00-7.00 sec 4.03 GBytes 34.6 Gbits/sec
[ 4] 7.00-8.00 sec 2.93 GBytes 25.1 Gbits/sec
[ 4] 8.00-9.00 sec 3.42 GBytes 29.4 Gbits/sec
[ 4] 9.00-10.00 sec 3.93 GBytes 33.8 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 36.7 GBytes 31.5 Gbits/sec 280 sender
[ 4] 0.00-10.00 sec 36.7 GBytes 31.5 Gbits/sec
receiver
Test 5: 12.1-RELEASE with r356703 applied, single stream receive, standard
mtu, TSO enabled, LRO disabled
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -c <12.1 VM IP> -p 1234
Connecting to host <12.1 VM IP>, port 1234
[ 4] local <Ubuntu VM IP> port 48392 connected to <12.1 VM IP> port 1234
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 828 MBytes 6.95 Gbits/sec 1247 335 KBytes
[ 4] 1.00-2.00 sec 901 MBytes 7.56 Gbits/sec 1841 345 KBytes
[ 4] 2.00-3.00 sec 909 MBytes 7.62 Gbits/sec 1805 356 KBytes
[ 4] 3.00-4.00 sec 909 MBytes 7.62 Gbits/sec 2337 322 KBytes
[ 4] 4.00-5.00 sec 907 MBytes 7.61 Gbits/sec 1834 354 KBytes
[ 4] 5.00-6.00 sec 907 MBytes 7.61 Gbits/sec 1984 352 KBytes
[ 4] 6.00-7.00 sec 909 MBytes 7.62 Gbits/sec 2189 329 KBytes
[ 4] 7.00-8.00 sec 908 MBytes 7.62 Gbits/sec 2000 338 KBytes
[ 4] 8.00-9.00 sec 907 MBytes 7.61 Gbits/sec 2006 315 KBytes
[ 4] 9.00-10.00 sec 908 MBytes 7.61 Gbits/sec 1764 332 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 8.78 GBytes 7.54 Gbits/sec 19007
sender
[ 4] 0.00-10.00 sec 8.78 GBytes 7.54 Gbits/sec
receiver
Test 6: 12.1-RELEASE with r356703 applied, single stream receive, standard
mtu, TSO enabled, LRO disabled, sysctl dev.vmx.0.iflib.tx_abdicate=1
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -c <12.1 VM IP> -p 1234
Connecting to host <12.1 VM IP>, port 1234
[ 4] local <Ubuntu VM IP> port 48416 connected to <12.1 VM IP> port 1234
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.29 GBytes 11.1 Gbits/sec 3016 290 KBytes
[ 4] 1.00-2.00 sec 1.33 GBytes 11.4 Gbits/sec 4133 322 KBytes
[ 4] 2.00-3.00 sec 1.34 GBytes 11.5 Gbits/sec 5409 335 KBytes
[ 4] 3.00-4.00 sec 1.35 GBytes 11.6 Gbits/sec 3899 376 KBytes
[ 4] 4.00-5.00 sec 1.35 GBytes 11.6 Gbits/sec 4609 300 KBytes
[ 4] 5.00-6.00 sec 1.35 GBytes 11.6 Gbits/sec 4603 303 KBytes
[ 4] 6.00-7.00 sec 1.36 GBytes 11.7 Gbits/sec 4417 293 KBytes
[ 4] 7.00-8.00 sec 1.34 GBytes 11.5 Gbits/sec 5680 290 KBytes
[ 4] 8.00-9.00 sec 1.33 GBytes 11.5 Gbits/sec 5461 359 KBytes
[ 4] 9.00-10.00 sec 1.03 GBytes 8.86 Gbits/sec 5060 329 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 13.1 GBytes 11.2 Gbits/sec 46287
sender
[ 4] 0.00-10.00 sec 13.1 GBytes 11.2 Gbits/sec
receiver
Test 7: 12.1-RELEASE with r356703 applied, single stream receive, standard
mtu, TSO enabled, LRO enabled
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -c <12.1 VM IP> -p 1234
Connecting to host <12.1 VM IP>, port 1234
[ 4] local <Ubuntu VM IP> port 48396 connected to <12.1 VM IP> port 1234
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 98.5 MBytes 826 Mbits/sec 129 2.83 KBytes
[ 4] 1.00-2.00 sec 63.6 KBytes 521 Kbits/sec 25 2.83 KBytes
[ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 25 2.83 KBytes
[ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 16 2.83 KBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 15 2.83 KBytes
[ 4] 5.00-6.00 sec 63.6 KBytes 521 Kbits/sec 15 2.83 KBytes
[ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 15 2.83 KBytes
[ 4] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 12 2.83 KBytes
[ 4] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 15 2.83 KBytes
[ 4] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 11 1.41 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 98.7 MBytes 82.8 Mbits/sec 278 sender
[ 4] 0.00-10.00 sec 97.8 MBytes 82.0 Mbits/sec
receiver
Test 8: 12.1-RELEASE with r356703 applied, single stream transmit, standard
mtu, TSO enabled, LRO disabled
======
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
$ iperf3 -R -c <12.1 VM IP> -p 1234
Connecting to host <12.1 VM IP>, port 1234
Reverse mode, remote host <12.1 VM IP> is sending
[ 4] local <Ubuntu VM IP> port 48400 connected to <12.1 VM IP> port 1234
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 4.25 GBytes 36.5 Gbits/sec
[ 4] 1.00-2.00 sec 3.29 GBytes 28.3 Gbits/sec
[ 4] 2.00-3.00 sec 3.61 GBytes 31.0 Gbits/sec
[ 4] 3.00-4.00 sec 3.93 GBytes 33.8 Gbits/sec
[ 4] 4.00-5.00 sec 4.17 GBytes 35.8 Gbits/sec
[ 4] 5.00-6.00 sec 3.53 GBytes 30.3 Gbits/sec
[ 4] 6.00-7.00 sec 3.22 GBytes 27.7 Gbits/sec
[ 4] 7.00-8.00 sec 3.90 GBytes 33.5 Gbits/sec
[ 4] 8.00-9.00 sec 2.80 GBytes 24.1 Gbits/sec
[ 4] 9.00-10.00 sec 2.78 GBytes 23.9 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 35.5 GBytes 30.5 Gbits/sec 571 sender
[ 4] 0.00-10.00 sec 35.5 GBytes 30.5 Gbits/sec
receiver
Based on the above, it looks like:
(1) The non-LRO single-stream TCP receive performance of the iflib vmx
driver in 12.1 release lags behind the non-LRO single-stream TCP receive
performance of the pre-iflib vmx driver in 12.0 (by about 20%, 7.54 Gbps
[Test 5] vs 9.28 Gbps [Test 1]), unless tx_abdicate is enabled, in which
case the vmx driver performs better (by about 20%, 11.2 Gbps [Test 6] vs
9.28 Gbps [Test 1]).
(2) The TSO-enabled single-stream TCP send performance of the iflib vmx
driver in 12.1 release (with TSO bug patch applied) is at parity with the
pre-iflib vmx driver in 12.0 (30.5 Gbps [Test 8] and 31.5 Gbps [Test 4]).
(3) There are LRO-related bugs in both the pre-iflib vmx driver in 12.0
(see Test 3) and the iflib vmx driver in 12.1 (see Test 7), they just
surface differently.
The categories of root causes for bugs and performance issues are: bugs in
the vmx driver, bugs in iflib, and behavioral variations across the many
fielded versions of the VMXNET3 virtual device. Indeed, all of these
categories have been encountered in the past year. Also, there is a rich
set of driver configuration and operating environment parameters, which
makes advancing the overall robustness of the driver (instead of just
shifting issues into or out of one's own operating parameter space) an
arduous task.
I think the right way to approach this is to continue to fill out the test
matrix and root cause and resolve all of the issues encountered, rather
than argue for reverting to the old driver out of frustration based on a
narrow set of (so far, rather poorly characterized) circumstances. I'm in
a position to do this, from the standpoint of substantial knowledge of the
vmx driver and virtual device, as well as of iflib internals, and I will be
doing this, as non-work cycles become available.
Best,
Patrick
More information about the freebsd-net
mailing list