Re: FreeBSD TCP (with iperf3) comparison with Linux
- In reply to: Murali Krishnamurthy : "Re: FreeBSD TCP (with iperf3) comparison with Linux"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 30 Jun 2023 16:32:08 UTC
I used an emulation testbed from Emulab.net with Dummynet traffic shaper adding 100ms RTT between two nodes, the link capacity is 1Gbps and both nodes are using freebsd13.2. cc@s1:~ % ping -c 3 r1 PING r1-link1 (10.1.1.3): 56 data bytes 64 bytes from 10.1.1.3: icmp_seq=0 ttl=64 time=100.091 ms 64 bytes from 10.1.1.3: icmp_seq=1 ttl=64 time=99.995 ms 64 bytes from 10.1.1.3: icmp_seq=2 ttl=64 time=99.979 ms --- r1-link1 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 99.979/100.022/100.091/0.049 ms cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic Connecting to host r1, port 5201 [ 5] local 10.1.1.2 port 56089 connected to 10.1.1.3 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.19 MBytes 35.2 Mbits/sec 0 1.24 MBytes [ 5] 1.00-2.00 sec 56.5 MBytes 474 Mbits/sec 6 2.41 MBytes [ 5] 2.00-3.00 sec 58.6 MBytes 492 Mbits/sec 18 7.17 MBytes [ 5] 3.00-4.00 sec 65.6 MBytes 550 Mbits/sec 14 606 KBytes [ 5] 4.00-5.00 sec 60.8 MBytes 510 Mbits/sec 18 7.22 MBytes [ 5] 5.00-6.00 sec 62.1 MBytes 521 Mbits/sec 12 7.86 MBytes [ 5] 6.00-7.00 sec 60.9 MBytes 512 Mbits/sec 14 3.43 MBytes [ 5] 7.00-8.00 sec 62.8 MBytes 527 Mbits/sec 16 372 KBytes [ 5] 8.00-9.00 sec 59.3 MBytes 497 Mbits/sec 14 1.77 MBytes [ 5] 9.00-10.00 sec 57.0 MBytes 477 Mbits/sec 18 7.13 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 548 MBytes 459 Mbits/sec 130 sender [ 5] 0.00-10.10 sec 540 MBytes 449 Mbits/sec receiver iperf Done. cc@s1:~ % ifconfig bce4 bce4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> ether 00:10:18:56:94:d4 inet 10.1.1.2 netmask 0xffffff00 broadcast 10.1.1.255 media: Ethernet 1000baseT <full-duplex> status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> I believe the default values for bce tx/rx pages are 2. And I happened to find this problem before that when the tx queue was full, it would not enqueue packets and started return errors. And this error was misunderstood by the TCP layer as retransmission. After adding hw.bce.tx_pages=4 and hw.bce.rx_pages=4 in /boot/loader.conf and reboot: cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic Connecting to host r1, port 5201 [ 5] local 10.1.1.2 port 20478 connected to 10.1.1.3 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 4.15 MBytes 34.8 Mbits/sec 0 1.17 MBytes [ 5] 1.00-2.00 sec 83.1 MBytes 697 Mbits/sec 0 12.2 MBytes [ 5] 2.00-3.00 sec 112 MBytes 939 Mbits/sec 0 12.2 MBytes [ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 12.2 MBytes [ 5] 4.00-5.00 sec 112 MBytes 940 Mbits/sec 0 12.2 MBytes [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 0 12.2 MBytes [ 5] 6.00-7.00 sec 112 MBytes 938 Mbits/sec 0 12.2 MBytes [ 5] 7.00-8.00 sec 113 MBytes 944 Mbits/sec 0 12.2 MBytes [ 5] 8.00-9.00 sec 112 MBytes 938 Mbits/sec 0 12.2 MBytes [ 5] 9.00-10.00 sec 113 MBytes 947 Mbits/sec 0 12.2 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 985 MBytes 826 Mbits/sec 0 sender [ 5] 0.00-10.11 sec 982 MBytes 815 Mbits/sec receiver iperf Done. Best Regards, Cheng Cui On Fri, Jun 30, 2023 at 12:26 PM Murali Krishnamurthy <muralik1@vmware.com> wrote: > Richard, > > > > Appreciate the useful inputs you have shared so far. Will try to figure > out regarding packet drops. > > > > Regarding HyStart, I see even BSD code base has support for this. May I > know by when can we see that in an release, if not already available ? > > > > Regarding this point : *“**Switching to other cc modules may give some > more insights. But again, I suspect that momentary (microsecond) burstiness > of BSD may be causing this significantly higher loss rate.**”* > > Is there some info somewhere where I can understand more on this in detail? > > > > Regards > > Murali > > > > > > On 30/06/23, 9:35 PM, "owner-freebsd-transport@freebsd.org" < > owner-freebsd-transport@freebsd.org> wrote: > > > > Hi Murali, > > > > > Q. Since you mention two hypervisors - what is the phyiscal network > topology in between these two servers? What theoretical link rates would be > attainable? > > > > > > Here is the topology > > > > > > Iperf end points are on 2 different hypervisors. > > > > > > ——————————— ———————————————— > —————— ——————-— > > > | Linux VM1 | | BSD 13 VM > 1 | > | Linux VM2 | | BSD 13 VM 2 | > > > |___________| |_ ____ ____ ___ > | |___________ > | |_ ____ ____ ___ | > > > | | > | > | | > > > > | | > | | > > > > ——————————————— ——————————————— > > > | ESX Hypervisor 1 | 10G link connected via > L2 Switch | ESX Hypervisor 2 | > > > | > |———————————————————————— > | | > > > |—————————————— > | > |——————————————| > > > > > > > > > Nic is of 10G capacity on both ESX server and it has below config. > > > > > > So, when both VMs run on the same Hypervisor, maybe with another VM to > simulate the 100ms delay, can you attain a lossless baseline scenario? > > > > > > > BDP for 16MB Socket buffer: 16 MB * (1000 ms * 100ms latency) * 8 bits/ > 1024 = 1.25 Gbps > > > > > > So theoretically we should see close to 1.25Gbps of Bitrate and we see > Linux reaching close to this number. > > > > Under no loss, yes. > > > > > > > But BSD is not able to do that. > > > > > > > > > Q. Did you run iperf3? Did the transmitting endpoint report any > retransmissions between Linux or FBSD hosts? > > > > > > Yes, we used iper3. I see Linux doing less number retransmissions > compared to BSD. > > > On BSD, the best performance was around 600 Mbps bitrate and the number > of retransmissions for this number seen is around 32K > > > On Linux, the best performance was around 1.15 Gbps bitrate and the > number of retransmissions for this number seen is only 2K. > > > So as you pointed the number of retransmissions in BSD could be the real > issue here. > > > > There are other cc modules available; but I believe one major deviation is > that Linux can perform mechanisms like hystart; ACKing every packet when > the client detects slow start; perform pacing to achieve more uniform > packet transmissions. > > > > I think the next step would be to find out, at which queue those packet > discards are coming from (external switch? delay generator? Vswitch? Eth > stack inside the VM?) > > > > Or alternatively, provide your ESX hypervisors with vastly more link > speed, to rule out any L2 induced packet drops - provided your delay > generator is not the source when momentarily overloaded. > > > > > Is there a way to reduce this packet loss by fine tuning some parameters > w.r.t ring buffer or any other areas? > > > > Finding where these arise (looking at queue and port counters) would be > the next step. But this is not really my specific area of expertise beyond > the high level, vendor independent observations. > > > > Switching to other cc modules may give some more insights. But again, I > suspect that momentary (microsecond) burstiness of BSD may be causing this > significantly higher loss rate. > > > > TCP RACK would be another option. That stack has pacing, more fine-grained > timing, the RACK loss recovery mechanisms etc. Maybe that helps reduce the > observed packet drops by iperf, and consequently, yield a higher overall > throuhgput. > > > > > > > > >