RE: Very slow scp performance comparing to Linux

From: Wei Hu <weh_at_microsoft.com>
Date: Tue, 29 Aug 2023 07:07:39 UTC
Hi Mark,

Sorry for the top posting. But I don't want to make it look too messy. Here is the
Information that I have missed in my original email.

All VMs are running on Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU).

FreeBSD VMs are 16 vcpu with 128 GB memory, in non-debug build:
14.0-ALPHA1 FreeBSD 14.0-ALPHA1 amd64 1400094 #7 nodbg-n264692-59e706ffee52-dirty... /usr/obj/usr/src/main/amd64.amd64/sys/GENERIC-NODEBUG amd64

Ubuntu VMs are 4 vcpu with 32 GB memory, kernel version:
6.2.0-1009-azure #9~22.04.3-Ubuntu SMP Tue Aug  1 20:51:07 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I did a couple more tests as suggested by others in this thread. In recap:

Scp to localhost, FreeBSD (ufs) vs Ubuntu (ext4): 70 MB/s vs 550 MB/s
Scp to localhost, FreeBSD (tmpfs) vs Ubuntu (tmpfs): 630 MB/s vs 660 MB/s

Iperf3 single stream to localhost: FreeBSD vs Ubuntu: 30.9 Gb/s vs 48.8 Gb/s

Would these numbers suggest that
1. ext4 caches a lot more than ufs?
2. there is a tcp performance gap in the network stack between FreeBSD and Ubuntu?

Would you also try run scp on ufs on your bare metal arm host? I am curious to now how different between ufs and zfs.

Thanks,
Wei


> -----Original Message-----
> From: Mark Millard <marklmi@yahoo.com>
> Sent: Tuesday, August 29, 2023 12:16 AM
> To: Wei Hu <weh@microsoft.com>; FreeBSD Hackers <freebsd-
> hackers@freebsd.org>
> Subject: Re: Very slow scp performance comparing to Linux
> 
> On Aug 28, 2023, at 08:43, Mark Millard <marklmi@yahoo.com> wrote:
> 
> > Wei Hu <weh_at_microsoft.com> wrote on
> > Date: Mon, 28 Aug 2023 07:32:35 UTC :
> >
> >> When I was testing a new NIC, I found the single stream scp performance
> was almost 8 time slower than Linux on the RX side. Initially I thought it might
> be something with the NIC. But when I switched to sending the file on
> localhost, the numbers stay the same.
> >>
> >> Here I was sending a 2GB file from sender to receiver using scp. FreeBSD is a
> recent NON-DEBUG build from CURRENT. The Ubuntu Linux kernel is 6.2.0.
> Both run in HyperV VMs on the same type of hardware. The FreeBSD VM has
> 16 vcpus, while Ubuntu VM has 4 vcpu.
> >>
> >> Sender Receiver throughput
> >> Linux FreeBSD 70 MB/s
> >> Linux Linux 550 MB/s
> >> FreeBSD FreeBSD 70 MB/s
> >> FreeBSD Linux 350 MB/s
> >> FreeBSD localhost 70 MB/s
> >> Linux localhost 550 MB/s
> >>
> >> From theses test, it seems I can rule out the issue on NIC and its driver.
> Looks the FreeBSD kernel network stack is much slower than Linux on single
> stream TCP, or there are some problem with scp?
> >>
> >> I also tried turning on following kernel parameters on FreeBSD kernel. But it
> makes no difference, neither do the other tcp cc algorithms such as htcp and
> newreno.
> >>
> >> net.inet.tcp.soreceive_stream="1"
> >> net.isr.maxthreads="-1"
> >> net.isr.bindthreads="1"
> >>
> >> net.inet.ip.intr_queue_maxlen=2048
> >> net.inet.tcp.recvbuf_max=16777216
> >> net.inet.tcp.recvspace=419430
> >> net.inet.tcp.sendbuf_max=16777216
> >> net.inet.tcp.sendspace=209715
> >> kern.ipc.maxsockbuf=16777216
> >>
> >> Any ideas?
> >
> >
> > You do not give explicit commands to try. Nor do you specify your
> > hardware context that is involved, just that HyperV is involved.
> >
> > So, on a HoneyComb (16 cortex-A72's) with Optane boot media in its
> > PCIe slot I, no HyperV or VM involved, tried:
> 
> I should have listed the non-debug build in use:
> 
> # uname -apKU
> FreeBSD CA72-16Gp-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64
> 1500000 #110 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:53 PDT
> 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA72-nodbg-
> clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64
> aarch64 1500000 1500000
> 
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                              100% 5120MB
> 120.2MB/s   00:42
> >
> > It is not a high performance system. 64 GiBytes of RAM.
> >
> > So instead trying a ThreadRipper 1950X that also has Optane in a CPIe
> > slot for its boot media, no HyperV or VM involved,
> 
> I should have listed the non-debug build in use:
> 
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 1500000
> #116 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:20 PDT 2023
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-
> src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1500000 1500000
> 
> (Same source tree content.)
> 
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                              100% 5120MB
> 299.7MB/s   00:17
> >
> > (These systems do not run with any tmpfs areas, not even /tmp . So I'm
> > not providing that kind of example, at least for now.)
> >
> > 128 GiBytes of RAM.
> >
> > Both systems are ZFS based but with a simple single partition.
> > (Used for bectl BE not for other types of reasons to use ZFS.
> > I could boot UFS variants of the boot media and test that kind of
> > context.)
> >
> > So both show between your FreeBSD figure and the Linux figure.
> > I've no means of checking how reasonable the figures are relative to
> > your test context. I just know the results are better than you report
> > for localhost use.
> 
> Adding a Windows Dev Kit 2023 booting via USB3 (but via a
> U.2 adapter to Optane media), again ZFS, again no VM involved:
> 
> # uname -apKU
> FreeBSD CA78C-WDK23-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64
> 1500000 #13 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:20:31 PDT
> 2023     root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-
> clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C arm64
> aarch64 1500000 1500000
> 
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-
> 77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                              100% 5120MB
> 168.7MB/s   00:30
> 
> 
> Note: the cortex-a72 and cortex-a78c/x1c builds were optimized via -mcpu=
> use. The ThreadRipper build was not.
> 
> 
> Note: I've not controlled for if the reads of the input *.img data were gotten
> from memory caching of prior activity or not. I could do so if you want: reboot
> before scp command.
> 
> ===
> Mark Millard
> marklmi at yahoo.com