Re: Very slow scp performance comparing to Linux

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 29 Aug 2023 11:22:26 UTC
[Adding USB3/U.2 Optane UFS Windows Dev Kit 2023 scp examples,
no VM's involved.]

On Aug 29, 2023, at 03:27, Mark Millard <marklmi@yahoo.com> wrote:

> Wei Hu <weh_at_microsoft.com> wrote on
> Date: Tue, 29 Aug 2023 07:07:39 UTC :
> 
>> Sorry for the top posting. But I don't want to make it look too messy. Here is the
>> Information that I have missed in my original email.
>> 
>> All VMs are running on Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU).
>> 
>> FreeBSD VMs are 16 vcpu with 128 GB memory, in non-debug build:
>> 14.0-ALPHA1 FreeBSD 14.0-ALPHA1 amd64 1400094 #7 nodbg-n264692-59e706ffee52-dirty... /usr/obj/usr/src/main/amd64.amd64/sys/GENERIC-NODEBUG amd64
>> 
>> Ubuntu VMs are 4 vcpu with 32 GB memory, kernel version:
>> 6.2.0-1009-azure #9~22.04.3-Ubuntu SMP Tue Aug 1 20:51:07 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
>> 
>> I did a couple more tests as suggested by others in this thread. In recap:
>> 
>> Scp to localhost, FreeBSD (ufs) vs Ubuntu (ext4): 70 MB/s vs 550 MB/s
>> Scp to localhost, FreeBSD (tmpfs) vs Ubuntu (tmpfs): 630 MB/s vs 660 MB/s
>> 
>> Iperf3 single stream to localhost: FreeBSD vs Ubuntu: 30.9 Gb/s vs 48.8 Gb/s
>> 
>> Would these numbers suggest that
>> 1. ext4 caches a lot more than ufs?
>> 2. there is a tcp performance gap in the network stack between FreeBSD and Ubuntu?
>> 
>> Would you also try run scp on ufs on your bare metal arm host? I am curious to now how different between ufs and zfs.
> 
> 
> For this round I'm rebooting between the unxz and the 1st scp.
> So I'll also have zfs results again. I'll also do a 2nd scp
> (no reboot) to see if it gets notably different results.
> 
> . . .
> 
> Well, I just got FreeBSD main [so: 15] running under
> HyperV on the Windows Dev Kit 2023. So reporting for
> there first. This was via an ssh session. The context
> is ZFS. The VM file size is fixed, as is the RAM size.
> 6 cores (of 8) and 24576 MiBytes assigned (of 32
> GiBytes) to the one FreeBSD instance. The VM file is
> on the internal NVMe drive in the Windows 11 Pro file
> system in the default place.
> 
> (I was having it copy the hardrive media to the VM file
> when I started this process. Modern HyperV no longer
> seems to support direct use of USB3 physical media. I
> first had to produce a copy of the material on smaller
> media so that a fixed VM file size from a copy to
> create the VM file would fit in the NVMe's free space.)
> 
> # uname -apKU
> FreeBSD CA78C-WDK23s-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 1500000 #13 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:20:31 PDT 2023     root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C arm64 aarch64 1500000 1500000
> 
> (The ZFS content is a copy of the USB3 interfaced
> ZFS Optane media's content previously reported on.
> So the installed system was built with -mcpu= based
> optimization, as noted before.)
> 
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                            100% 5120MB 193.6MB/s   00:26
> 
> # rm ~/FreeBSD-14-TEST.img
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                            100% 5120MB 198.0MB/s   00:25
> 
> 
> So, faster than what you are reporting for the
> Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU)
> context.
> 
> For reference:
> 
> # gpart show -pl
> =>       40  468862055    da0  GPT  (224G)
>         40      32728         - free -  (16M)
>      32768     102400  da0p1  wdk23sCA78Cefi  (50M)
>     135168  421703680  da0p2  wdk23sCA78Czfs  (201G)
>  421838848   47022080  da0p3  wdk23sCA78Cswp22  (22G)
>  468860928       1167         - free -  (584K)
> 
> # zpool list
> NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
> zwdk23s   200G  79.8G   120G        -         -     0%    39%  1.00x    ONLINE  -
> 
> (UFS would have notably more allocated and less free
> for the same size partition.)
> 
> 
> 
> The below is be based on the HoneyComb (16 cortex-a72's)
> since I've got the HyperV context going on the Windows
> Dev Kit 2023 at the moment.
> 
> 
> UFS first:
> 
> # uname -apKU
> FreeBSD HC-CA72-UFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 1500000 #110 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:53 PDT 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1500000 1500000
> 
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                            100% 5120MB 129.7MB/s   00:39
> 
> # rm ~/FreeBSD-14-TEST.img
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                            100% 5120MB 130.9MB/s   00:39
> 
> 
> So, faster than what you are reporting for the
> Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU)
> context.
> 
> Note: This is via a U.2 Optane 960 GB media and an M.2 adapter
> instead of being via a PCIe Optane 960 GB media in the PCIe
> slot.
> 
> 
> ZFS second:
> 
> # uname -apKU
> FreeBSD CA72-16Gp-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 1500000 #110 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:19:53 PDT 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1500000 1500000
> 
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> . . .
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                            100% 5120MB 121.1MB/s   00:42
> 
> # rm ~/FreeBSD-14-TEST.img
> # scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
> (root@localhost) Password for root@CA72-16Gp-ZFS:
> FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                            100% 5120MB 124.6MB/s   00:41
> 
> 
> So, faster than what you are reporting for the
> Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU)
> context.
> 
> Note: This is via a PCIe Optane 960 GB media in the
> PCIe slot.
> 
> 
> UFS was slightly faster then ZFS for the HoneyComb
> context but there is the M.2 vs. PCIe difference
> as well.
> 

# uname -apKU
FreeBSD CA78C-WDK23-UFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 1500000 #13 main-n265027-2f06449d6429-dirty: Fri Aug 25 09:20:31 PDT 2023     root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C arm64 aarch64 1500000 1500000

Again, a -mcpu= optimized build context for the FreeBSD in
operation.

(Still rebooting first. Then . . .)

# scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
. . .
FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                              100% 5120MB 199.3MB/s   00:25

# rm ~/FreeBSD-14-TEST.img
# scp FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img root@localhost:FreeBSD-14-TEST.img
. . .
FreeBSD-14.0-ALPHA2-arm-armv7-GENERICSD-20230818-77013f29d048-264841.img                                                                                              100% 5120MB 204.9MB/s   00:24


So, faster than what you are reporting for the
Intel(R) Xeon(R) Platinum 8473C (2100.00-MHz K8-class CPU)
context.

The Windows Dev Kit 2023 figures are generally faster than the
HoneyComb figures.

===
Mark Millard
marklmi at yahoo.com