Re: NFS in bhyve VM mounted via bridge interface

From: Zhenlei Huang <zlei.huang_at_gmail.com>
Date: Mon, 31 Oct 2022 08:03:17 UTC
> On Oct 31, 2022, at 2:02 PM, Paul Procacci <pprocacci@gmail.com> wrote:
> 
> 
> 
> On Mon, Oct 31, 2022 at 12:00 AM John Doherty <bsdlists@jld3.net <mailto:bsdlists@jld3.net>> wrote:
> I have a machine running FreeBSD 12.3-RELEASE with a zpool that consists 
> of 12 mirrored pairs of 14 TB disks.  I'll call this the "storage 
> server." On that machine, I can write to ZFS file systems at around 950 
> MB/s and read from them at around 1450 MB/s. I'm happy with that.
> 
> I have another machine running Alma linux 8.6 that mounts file systems 
> from the storage server via NFS over a 10 GbE network. On this machine, 
> I can write to and read from an NFS file system at around 450 MB/s. I 
> wish that this were better but it's OK.
> 
> I created a bhyve VM on the storage server that also runs Alma linux 
> 8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a 
> tap interface:
> 
> [root@ss3] # ifconfig vm-storage
> vm-storage: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 
> mtu 1500
>         ether 82:d3:46:17:4e:ee
>         id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
>         maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
>         root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
>         member: tap1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>                 ifmaxaddr 0 port 10 priority 128 path cost 2000000
>         member: ixl0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>                 ifmaxaddr 0 port 5 priority 128 path cost 2000
>         groups: bridge vm-switch viid-ddece@
>         nd6 options=1<PERFORMNUD>
> 
> I mount file systems from the storage server on this VM via NFS. I can 
> write to those file systems at around 250 MB/s and read from them at 
> around 280 MB/s. This surprised me a little: I thought that this might 
> perform better than or at least as well as the physical 10 GbE network 
> but find that it performs significantly worse.
> 
> All my read and write tests here are stupidly simple, using dd to read 
> from /dev/zero and write to a file or to read from a file and write to 
> /dev/null.
> 
> Is anyone else either surprised or unsurprised by these results?
> 
> I have not yet tried passing a physical interface on the storage server 
> through to the VM with PCI passthrough, but the machine does have 
> another 10 GbE interface I could use for this. This stuff is all about 
> 3,200 miles away from me so I need to get someone to plug a cable in for 
> me. I'll be interested to see how that works out, though.
> 
> Any comments much appreciated. Thanks.
> 
> 
> 
> I was getting geared up to help you with this and then this happened:
> 
> Host:
> # dd if=17-04-27.mp4 of=/dev/null bs=4096
> 216616+1 records in
> 216616+1 records out
> 887263074 bytes transferred in 76.830892 secs (11548259 bytes/sec)
> 
> VM:
> dd if=17-04-27.mp4 of=/dev/null bs=4096
> 216616+1 records in
> 216616+1 records out
> 887263074 bytes transferred in 7.430017 secs (119416016 bytes/sec)
> 
> I'm totally flabbergasted.  These results are consistent and not at all what I expected to see.
> I even ran the tests on the VM first and the host second.  Call me confused.

I thinks you should bypass local cache while testing. Try iflag=direct , see dd(1) .

If the input file 17-04-27.mp4 is on NFS, then you could also verify the network IO by netstat.

> 
> Anyways, that's a problem for me to figure out.
> 
> Back to your problem, I had something typed out concerning checking rxsum's and txsum's are turned off on
> the interfaces, or at least see if that makes a difference, trying to use a disk type of nvme, and trying ng_bridge
> w/ netgraph interfaces but now I'm concluding my house is made of glass -- Hah! -- so until I get my house in
> order I'm going to refrain from providing details.
> 
> Sorry and thanks!
> ~Paul

Best regards,
Zhenlei