RE: Very slow scp performance comparing to Linux

From: Wei Hu <weh_at_microsoft.com>
Date: Tue, 29 Aug 2023 12:55:35 UTC
Hi Mark,

Thanks for the update. Seems the numbers are the same on zfs and ufs. That's 
good to know. 

Yes, your numbers on ARM64 are better than mine on Intel. However, my original
intention was to find out why scp on Linux is performing much better than FreeBSD
under the same hardware env. 

Is it possible to try Linux in your ARM64 setting? I am using Ubuntu 22.04 on ext4 
file system.

Thanks,
Wei 


> -----Original Message-----
> From: Mark Millard <marklmi@yahoo.com>
> Sent: Tuesday, August 29, 2023 7:22 PM
> To: Wei Hu <weh@microsoft.com>
> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
> Subject: Re: Very slow scp performance comparing to Linux
> 
> [Adding USB3/U.2 Optane UFS Windows Dev Kit 2023 scp examples, no VM's
> involved.]
> 
> On Aug 29, 2023, at 03:27, Mark Millard <marklmi@yahoo.com> wrote:
> 
> > Wei Hu <weh_at_microsoft.com> wrote on
> > Date: Tue, 29 Aug 2023 07:07:39 UTC :
> >
> >> Sorry for the top posting. But I don't want to make it look too
> >> messy. Here is the Information that I have missed in my original email.
> >>
> >> All VMs are running on Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz
> K8-class CPU).
> >>
> >> FreeBSD VMs are 16 vcpu with 128 GB memory, in non-debug build:
> >> 14.0-ALPHA1 FreeBSD 14.0-ALPHA1 amd64 1400094 #7
> >> nodbg-n264692-59e706ffee52-dirty...
> >> /usr/obj/usr/src/main/amd64.amd64/sys/GENERIC-NODEBUG amd64
> >>
> >> Ubuntu VMs are 4 vcpu with 32 GB memory, kernel version:
> >> 6.2.0-1009-azure #9~22.04.3-Ubuntu SMP Tue Aug 1 20:51:07 UTC 2023
> >> x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> I did a couple more tests as suggested by others in this thread. In recap:
> >>
> >> Scp to localhost, FreeBSD (ufs) vs Ubuntu (ext4): 70 MB/s vs 550 MB/s
> >> Scp to localhost, FreeBSD (tmpfs) vs Ubuntu (tmpfs): 630 MB/s vs 660
> >> MB/s
> >>
> >> Iperf3 single stream to localhost: FreeBSD vs Ubuntu: 30.9 Gb/s vs
> >> 48.8 Gb/s
> >>
> >> Would these numbers suggest that
> >> 1. ext4 caches a lot more than ufs?
> >> 2. there is a tcp performance gap in the network stack between FreeBSD
> and Ubuntu?
> >>
> >> Would you also try run scp on ufs on your bare metal arm host? I am
> curious to now how different between ufs and zfs.
> >
> >
> > For this round I'm rebooting between the unxz and the 1st scp.
> > So I'll also have zfs results again. I'll also do a 2nd scp (no
> > reboot) to see if it gets notably different results.
> >
> > . . .
> >
> > Well, I just got FreeBSD main [so: 15] running under HyperV on the
> > Windows Dev Kit 2023. So reporting for there first. This was via an
> > ssh session. The context is ZFS. The VM file size is fixed, as is the
> > RAM size.
> > 6 cores (of 8) and 24576 MiBytes assigned (of 32
> > GiBytes) to the one FreeBSD instance. The VM file is on the internal
> > NVMe drive in the Windows 11 Pro file system in the default place.
> >
> > (I was having it copy the hardrive media to the VM file when I started
> > this process. Modern HyperV no longer seems to support direct use of
> > USB3 physical media. I first had to produce a copy of the material on
> > smaller media so that a fixed VM file size from a copy to create the
> > VM file would fit in the NVMe's free space.)
> >
> > # uname -apKU
> > FreeBSD CA78C-WDK23s-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT
> aarch64 1500000 #13 main-n265027-2f06449d6429-dirty: Fri Aug 25
> 09:20:31 PDT 2023     root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-
> CA78C-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-
> CA78C arm64 aarch64 1500000 1500000
> >
> > (The ZFS content is a copy of the USB3 interfaced ZFS Optane media's
> > content previously reported on.
> > So the installed system was built with -mcpu= based optimization, as
> > noted before.)
> >
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                            100% 5120MB
> 193.6MB/s   00:26
> >
> > # rm ~/FreeBSD-14-TEST.img
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                            100% 5120MB
> 198.0MB/s   00:25
> >
> >
> > So, faster than what you are reporting for the
> > Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU) context.
> >
> > For reference:
> >
> > # gpart show -pl
> > =>       40  468862055    da0  GPT  (224G)
> >         40      32728         - free -  (16M)
> >      32768     102400  da0p1  wdk23sCA78Cefi  (50M)
> >     135168  421703680  da0p2  wdk23sCA78Czfs  (201G)
> >  421838848   47022080  da0p3  wdk23sCA78Cswp22  (22G)
> >  468860928       1167         - free -  (584K)
> >
> > # zpool list
> > NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP
> HEALTH  ALTROOT
> > zwdk23s   200G  79.8G   120G        -         -     0%    39%  1.00x    ONLINE  -
> >
> > (UFS would have notably more allocated and less free for the same size
> > partition.)
> >
> >
> >
> > The below is be based on the HoneyComb (16 cortex-a72's) since I've
> > got the HyperV context going on the Windows Dev Kit 2023 at the
> > moment.
> >
> >
> > UFS first:
> >
> > # uname -apKU
> > FreeBSD HC-CA72-UFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64
> 1500000 #110 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:53 PDT
> 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA72-nodbg-
> clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64
> aarch64 1500000 1500000
> >
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                            100% 5120MB
> 129.7MB/s   00:39
> >
> > # rm ~/FreeBSD-14-TEST.img
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                            100% 5120MB
> 130.9MB/s   00:39
> >
> >
> > So, faster than what you are reporting for the
> > Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU) context.
> >
> > Note: This is via a U.2 Optane 960 GB media and an M.2 adapter instead
> > of being via a PCIe Optane 960 GB media in the PCIe slot.
> >
> >
> > ZFS second:
> >
> > # uname -apKU
> > FreeBSD CA72-16Gp-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64
> 1500000 #110 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:53 PDT
> 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA72-nodbg-
> clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64
> aarch64 1500000 1500000
> >
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > . . .
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                            100% 5120MB
> 121.1MB/s   00:42
> >
> > # rm ~/FreeBSD-14-TEST.img
> > # scp
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.i
> > mg root@localhost:FreeBSD-14-TEST.img
> > (root@localhost) Password for root@CA72-16Gp-ZFS:
> > FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                            100% 5120MB
> 124.6MB/s   00:41
> >
> >
> > So, faster than what you are reporting for the
> > Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU) context.
> >
> > Note: This is via a PCIe Optane 960 GB media in the PCIe slot.
> >
> >
> > UFS was slightly faster then ZFS for the HoneyComb context but there
> > is the M.2 vs. PCIe difference as well.
> >
> 
> # uname -apKU
> FreeBSD CA78C-WDK23-UFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64
> 1500000 #13 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:20:31 PDT
> 2023     root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-
> clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C arm64
> aarch64 1500000 1500000
> 
> Again, a -mcpu= optimized build context for the FreeBSD in
> operation.
> 
> (Still rebooting first. Then . . .)
> 
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-
> 77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                              100% 5120MB
> 199.3MB/s   00:25
> 
> # rm ~/FreeBSD-14-TEST.img
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-
> 77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-
> 264841.img                                                                                              100% 5120MB
> 204.9MB/s   00:24
> 
> 
> So, faster than what you are reporting for the
> Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU)
> context.
> 
> The Windows Dev Kit 2023 figures are generally faster than the
> HoneyComb figures.
> 
> ===
> Mark Millard
> marklmi at yahoo.com