From nobody Mon Oct 31 08:03:17 2022 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4N15J23VNZz4h26D for ; Mon, 31 Oct 2022 08:03:30 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4N15J1390Qz42w9 for ; Mon, 31 Oct 2022 08:03:29 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Received: by mail-pj1-x102a.google.com with SMTP id l22-20020a17090a3f1600b00212fbbcfb78so15178315pjc.3 for ; Mon, 31 Oct 2022 01:03:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=7gN90N2S92KX+SZpfzUZmZwSfD1VSIg6s/lkPXdre3s=; b=H1z0F01ZsGgNTLVwBNMbbIBmCufRFWVob+gNeBAFtaX7BfLPadh85D6bStdPdLOqaj PH8CawyLZGJkQN4LcMES/PMWfe8NnP0gBYgg8Sme65aKXMynI9JWgAO+fIaBHDYcKo5k Bw1qWeaylk9mtVj3hUvq0MjLVpspKmty7zSCvj/rsnUW8tvfmiB/GzmS0JaAWq57NJT9 kQTk+PGBF1ih18aEuRnp3qUcHWeut+AXdm/RUXOR9q5An//mtHdfmTbqNd4NhlKlDuEi VlZyZVTiBLz4jdBf/NBefeXjzZ6Kq/jNkBNu3vwg3ZfW/JNJfWIMZNOJwI9Nsey6Mzd9 ZPrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7gN90N2S92KX+SZpfzUZmZwSfD1VSIg6s/lkPXdre3s=; b=e99gJcEN1HwEPUTi89veGYndaMjIpnatoVRJi/DScHJs2JrE53gMfPHOPS2C+p0GEX wKlLAa+0fcAotupGmUCnug3mIBvjpv/AULIcXiCa9A1t2QGI92znnUhwsOUmOsRGwtIc 6EsWhoBjOHcvh287PgBVOO7Os/FB9aejc7oFoHD1FCCvVABYJKEN2AWV70Wi+CfCbr2e 43HRQe+KSdJGCd6Rz2MnbgwMuXmt9/Iutyuv3El3l1UyGLSg+aMjsM/swhWB1M7bN2IA i/DSpYCPjE/ywaA+v0F6EoRlixg8XkjoXZXeJKwkvNfw7qTCyvDtimGmJzGO0sVcKTpc mGzw== X-Gm-Message-State: ACrzQf3l+xsj4no9qQ+fv40jIr6kyUmHm0UebrjIab/HdwiSB67dtlz9 z5CtnOXsmdrZ02gp1x8BOGg= X-Google-Smtp-Source: AMsMyM5AGeGhjEr9hhJb+3usYkPNteSI7E3Vr5yvof8i9HVbvnztmnFSYQc8jcILhH47E9KQTkl+aw== X-Received: by 2002:a17:902:e493:b0:186:9de4:a7cd with SMTP id i19-20020a170902e49300b001869de4a7cdmr13119547ple.66.1667203406921; Mon, 31 Oct 2022 01:03:26 -0700 (PDT) Received: from [172.17.252.129] (ns1.oxydns.net. [45.32.91.63]) by smtp.gmail.com with ESMTPSA id ij28-20020a170902ab5c00b00186a2444a43sm3850957plb.27.2022.10.31.01.03.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Oct 2022 01:03:26 -0700 (PDT) From: Zhenlei Huang Message-Id: <3858240B-7225-4ECB-B4A6-4DE006ED869D@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54" List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Subject: Re: NFS in bhyve VM mounted via bridge interface Date: Mon, 31 Oct 2022 16:03:17 +0800 In-Reply-To: Cc: FreeBSD virtualization To: Paul Procacci References: X-Mailer: Apple Mail (2.3608.120.23.2.7) X-Rspamd-Queue-Id: 4N15J1390Qz42w9 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=H1z0F01Z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of zlei.huang@gmail.com designates 2607:f8b0:4864:20::102a as permitted sender) smtp.mailfrom=zlei.huang@gmail.com X-Spamd-Result: default: False [-3.50 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.998]; NEURAL_HAM_MEDIUM(-1.00)[-0.998]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MV_CASE(0.50)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-virtualization@freebsd.org]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TAGGED_FROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::102a:from]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org] X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On Oct 31, 2022, at 2:02 PM, Paul Procacci = wrote: >=20 >=20 >=20 > On Mon, Oct 31, 2022 at 12:00 AM John Doherty > wrote: > I have a machine running FreeBSD 12.3-RELEASE with a zpool that = consists=20 > of 12 mirrored pairs of 14 TB disks. I'll call this the "storage=20 > server." On that machine, I can write to ZFS file systems at around = 950=20 > MB/s and read from them at around 1450 MB/s. I'm happy with that. >=20 > I have another machine running Alma linux 8.6 that mounts file systems=20= > from the storage server via NFS over a 10 GbE network. On this = machine,=20 > I can write to and read from an NFS file system at around 450 MB/s. I=20= > wish that this were better but it's OK. >=20 > I created a bhyve VM on the storage server that also runs Alma linux=20= > 8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a=20= > tap interface: >=20 > [root@ss3] # ifconfig vm-storage > vm-storage: flags=3D8843 = metric 0=20 > mtu 1500 > ether 82:d3:46:17:4e:ee > id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 > maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 > root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 > member: tap1 flags=3D143 > ifmaxaddr 0 port 10 priority 128 path cost 2000000 > member: ixl0 flags=3D143 > ifmaxaddr 0 port 5 priority 128 path cost 2000 > groups: bridge vm-switch viid-ddece@ > nd6 options=3D1 >=20 > I mount file systems from the storage server on this VM via NFS. I can=20= > write to those file systems at around 250 MB/s and read from them at=20= > around 280 MB/s. This surprised me a little: I thought that this might=20= > perform better than or at least as well as the physical 10 GbE network=20= > but find that it performs significantly worse. >=20 > All my read and write tests here are stupidly simple, using dd to read=20= > from /dev/zero and write to a file or to read from a file and write to=20= > /dev/null. >=20 > Is anyone else either surprised or unsurprised by these results? >=20 > I have not yet tried passing a physical interface on the storage = server=20 > through to the VM with PCI passthrough, but the machine does have=20 > another 10 GbE interface I could use for this. This stuff is all about=20= > 3,200 miles away from me so I need to get someone to plug a cable in = for=20 > me. I'll be interested to see how that works out, though. >=20 > Any comments much appreciated. Thanks. >=20 >=20 >=20 > I was getting geared up to help you with this and then this happened: >=20 > Host: > # dd if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096 > 216616+1 records in > 216616+1 records out > 887263074 bytes transferred in 76.830892 secs (11548259 bytes/sec) >=20 > VM: > dd if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096 > 216616+1 records in > 216616+1 records out > 887263074 bytes transferred in 7.430017 secs (119416016 bytes/sec) >=20 > I'm totally flabbergasted. These results are consistent and not at = all what I expected to see. > I even ran the tests on the VM first and the host second. Call me = confused. I thinks you should bypass local cache while testing. Try iflag=3Ddirect = , see dd(1) . If the input file 17-04-27.mp4 is on NFS, then you could also verify the = network IO by netstat. >=20 > Anyways, that's a problem for me to figure out. >=20 > Back to your problem, I had something typed out concerning checking = rxsum's and txsum's are turned off on > the interfaces, or at least see if that makes a difference, trying to = use a disk type of nvme, and trying ng_bridge > w/ netgraph interfaces but now I'm concluding my house is made of = glass -- Hah! -- so until I get my house in > order I'm going to refrain from providing details. >=20 > Sorry and thanks! > ~Paul Best regards, Zhenlei= --Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
On = Oct 31, 2022, at 2:02 PM, Paul Procacci <pprocacci@gmail.com>= wrote:



On Mon, Oct 31, 2022 at 12:00 AM John = Doherty <bsdlists@jld3.net> wrote:
I = have a machine running FreeBSD 12.3-RELEASE with a zpool that consists =
of 12 mirrored pairs of 14 TB disks.  I'll call this the "storage =
server." On that machine, I can write to ZFS file systems at around 950 =
MB/s and read from them at around 1450 MB/s. I'm happy with that.

I have another machine running Alma linux 8.6 that mounts file systems =
from the storage server via NFS over a 10 GbE network. On this machine, =
I can write to and read from an NFS file system at around 450 MB/s. I =
wish that this were better but it's OK.

I created a bhyve VM on the storage server that also runs Alma linux
8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a =
tap interface:

[root@ss3] # ifconfig vm-storage
vm-storage: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> = metric 0
mtu 1500
        ether 82:d3:46:17:4e:ee
        id 00:00:00:00:00:00 priority 32768 = hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 = timeout 1200
        root id 00:00:00:00:00:00 priority 32768 = ifcost 0 port 0
        member: tap1 = flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port = 10 priority 128 path cost 2000000
        member: ixl0 = flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port = 5 priority 128 path cost 2000
        groups: bridge vm-switch viid-ddece@
        nd6 options=3D1<PERFORMNUD>

I mount file systems from the storage server on this VM via NFS. I can =
write to those file systems at around 250 MB/s and read from them at
around 280 MB/s. This surprised me a little: I thought that this might =
perform better than or at least as well as the physical 10 GbE network =
but find that it performs significantly worse.

All my read and write tests here are stupidly simple, using dd to read =
from /dev/zero and write to a file or to read from a file and write to =
/dev/null.

Is anyone else either surprised or unsurprised by these results?

I have not yet tried passing a physical interface on the storage server =
through to the VM with PCI passthrough, but the machine does have
another 10 GbE interface I could use for this. This stuff is all about =
3,200 miles away from me so I need to get someone to plug a cable in for =
me. I'll be interested to see how that works out, though.

Any comments much appreciated. Thanks.



I was getting geared up to help = you with this and then this happened:

Host:
# dd = if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096
216616+1 = records in
216616+1 records out
887263074 = bytes transferred in 76.830892 secs (11548259 bytes/sec)
VM:
dd if=3D17-04-27.mp4 = of=3D/dev/null bs=3D4096
216616+1 records in
216616+1 records out
887263074 bytes transferred in 7.430017 secs (119416016 = bytes/sec)

I'm = totally flabbergasted.  These results are consistent and not at all = what I expected to see.
I even ran = the tests on the VM first and the host second.  Call me = confused.

I thinks you should bypass local cache while = testing. Try iflag=3Ddirect , see dd(1) .

If the input file 17-04-27.mp4 is on NFS, = then you could also verify the network IO by netstat.

Anyways, that's a problem for me to = figure out.

Back to = your problem, I had something typed out concerning checking rxsum's and = txsum's are turned off on
the interfaces, or at least see = if that makes a difference, trying to use a disk type of nvme, and = trying ng_bridge
w/ netgraph interfaces but now I'm = concluding my house is made of glass -- Hah! -- so until I get my house = in
order I'm going to refrain from providing details.

Sorry and thanks!
~Paul

Best = regards,
Zhenlei
= --Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54--