From nobody Wed Feb 28 21:02:45 2024 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TlRdq3BDgz5CQ6H for ; Wed, 28 Feb 2024 21:03:11 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TlRdq0MXpz4bMk for ; Wed, 28 Feb 2024 21:03:11 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2d22b8801b9so2575811fa.0 for ; Wed, 28 Feb 2024 13:03:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709154188; x=1709758988; darn=freebsd.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=6+qw8Y8RFPP1xZcM5Sr47TFZSWREudJ+M26tgVxLYMQ=; b=joiYaZzYz20Ao1yur/hsRhPVPppTBOLyJAC5+JPMW31gBmKt+CVl6gMae+Kui0JyqO sXsUXDKc8GI/+8XYGVjtok5LEVKsP5yWgoIFTHEeGw2CefSZ3ZBYC32Cn2cwCtn4j6V+ lQckD4PDqvtPyh6bQWN0Vw7MPdJyTXeBMYO9aTx5/wDodPC59BF/3MZ0GGSHujwKLayl MhAhTpTf68qCcRWV3YYufxrIxgc9BlY3khML4FkgHSC7eV/I6Y1225cJmdiHt93W28n1 wzQx+lRRZcuMZjsCFQBh5Mw6ktPS/jeG7KFT6BvcWES2PD3ElJ+hU1JZgHpU0yfpd1V9 m/JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709154188; x=1709758988; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6+qw8Y8RFPP1xZcM5Sr47TFZSWREudJ+M26tgVxLYMQ=; b=b6Jx/P1SrGyJZEjg53Dm3SI9zj+VPt+oJ8dZdDmcEobK2B7p4ek5ieSNSxYJk8qSvU xaR2rJ+sQX9PB9nOSYUREdn5R2o9nisjQDGQJCJJIBFlunTrO9G/fK7Q9jcUA9pKjnER 1nJKZO9fWo/iui6NMAc7QFTcpkt4Bkw8IkTxtgsJnzS9sh/iYyOcwHZuq/SeeiK15Unc LjLSHIIbAlM10qM2EtRXpENdcSTUMF0Wusk46rRoHiASqDbFBiffGYlvcfTml+6g3hrb nN+qOVveAd/og210NxFyskN6xyxawKXN95Z4Aq0SEhA02yEqM8EInn7+JXv6+m5yjAY0 h7Bw== X-Gm-Message-State: AOJu0YzViRNZWjUlXXPv31fyCBCJsduRqRLrbBPJZ0CbjfkACQabXrOt xACOw84+or07ZXYFm62whMnyHi8MKreuUfjz763sdy4UPSlfU5JspC65MoliVi99Ow== X-Google-Smtp-Source: AGHT+IHfkvc3C8omuO2YHHFokZsxAaQt/fn5ur2D0xBxJkewAhWznKKP536WBOY77FiNF9qZvGEdhA== X-Received: by 2002:a05:6512:2216:b0:512:9e78:998c with SMTP id h22-20020a056512221600b005129e78998cmr102196lfu.9.1709154187554; Wed, 28 Feb 2024 13:03:07 -0800 (PST) Received: from smtpclient.apple ([188.187.60.230]) by smtp.gmail.com with ESMTPSA id f7-20020a056512360700b00512e3924539sm36727lfs.309.2024.02.28.13.03.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Feb 2024 13:03:07 -0800 (PST) From: Vitaliy Gusev Message-Id: <3850080E-EBD1-4414-9C4E-DD89611C9F58@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_8118B541-F589-4E79-AF6C-3E98D8AADC93" List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.400.31\)) Subject: Re: bhyve disk performance issue Date: Thu, 29 Feb 2024 00:02:45 +0300 In-Reply-To: Cc: virtualization@freebsd.org To: Matthew Grooms References: <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net> <28ea168c-1211-4104-b8b4-daed0e60950d@app.fastmail.com> <0ff6f30a-b53a-4d0f-ac21-eaf701d35d00@shrew.net> <6f6b71ac-2349-4045-9eaf-5c50d42b89be@shrew.net> <50614ea4-f0f9-44a2-b5e6-ebb33cfffbc4@shrew.net> <6a4e7e1d-cca5-45d4-a268-1805a15d9819@shrew.net> <25ddf43d-f700-4cb5-af2a-1fe669d1e24b@shrew.net> <1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com> X-Mailer: Apple Mail (2.3774.400.31) X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4TlRdq0MXpz4bMk --Apple-Mail=_8118B541-F589-4E79-AF6C-3E98D8AADC93 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 28 Feb 2024, at 23:03, Matthew Grooms wrote: >=20 > ... > The virtual disks were provisioned with either a 128G disk image or a = 1TB raw partition, so I don't think space was an issue. > Trim is definitely not an issue. I'm using a tiny fraction of the 32TB = array have tried both heavily under-provisioned HW RAID10 and SW RAID10 = using GEOM. The latter was tested after sending full trim resets to all = drives individually. >=20 It could be then TRIM/UNMAP is not used, zvol (for the instance) becomes = full for the while. ZFS considers it as all blocks are used and write = operations could have troubles. I believe it was recently fixed. Also look at this one: GuestFS->UNMAP->bhyve->Host-FS->PhysicalDisk The problem of UNMAP that it could have unpredictable slowdown at any = time. So I would suggest to check results with enabled and disabled = UNMAP in a guest. > I will try to incorporate the rest of your feedback into my next round = of testing. If I can find a benchmark tool that works with a raw block = device, that would be ideal. >=20 >=20 Use =E2=80=9Cdd=E2=80=9D as the first step for read testing; ~# dd if=3D/dev/nvme0n2 of=3D/dev/null bs=3D1M status=3Dprogress = flag=3Ddirect ~# dd if=3D/dev/nvme0n2 of=3D/dev/null bs=3D1M status=3Dprogress Compare results with directio and without. =E2=80=9Cfio=E2=80=9D tool.=20 =20 1) write prepare: ~# fio --name=3Dprep --rw=3Dwrite --verify=3Dcrc32 --loop=3D1 = --numjobs=3D2 --time_based --thread --bs=3D1M --iodepth=3D32 = --ioengine=3Dlibaio --direct=3D1 --group_reporting --size=3D20G = --filename=3D/dev/nvme0n2 2) read test: ~# fio --name=3Dreadtest --rw=3Dread =E2=80=94loop=3D30 = --numjobs=3D2 --time_based --thread =E2=80=94bs=3D256K --iodepth=3D32 = --ioengine=3Dlibaio --direct=3D1 --group_reporting --size=3D20G = --filename=3D/dev/nvme0n2 =20 =E2=80=94 Vitaliy =20 > Thanks, >=20 > -Matthew >=20 >=20 >=20 >> =E2=80=94=E2=80=94 >> Vitaliy >>=20 >>> On 28 Feb 2024, at 21:29, Matthew Grooms = wrote: >>>=20 >>> On 2/27/24 04:21, Vitaliy Gusev wrote: >>>> Hi, >>>>=20 >>>>=20 >>>>> On 23 Feb 2024, at 18:37, Matthew Grooms = wrote: >>>>>=20 >>>>>> ... >>>>> The problem occurs when an image file is used on either ZFS or = UFS. The problem also occurs when the virtual disk is backed by a raw = disk partition or a ZVOL. This issue isn't related to a specific = underlying filesystem. >>>>>=20 >>>>=20 >>>> Do I understand right, you ran testing inside VM inside guest VM = on ext4 filesystem? If so you should be aware about additional overhead = in comparison when you were running tests on the hosts. >>>>=20 >>> Hi Vitaliy, >>>=20 >>> I appreciate you providing the feedback and suggestions. I spent = over a week trying as many combinations of host and guest options as = possible to narrow this issue down to a specific host storage or a guest = device model option. Unfortunately the problem occurred with every = combination I tested while running Linux as the guest. Note, I only = tested RHEL8 & RHEL9 compatible distributions ( Alma & Rocky ). The = problem did not occur when I ran FreeBSD as the guest. The problem did = not occur when I ran KVM in the host and Linux as the guest. >>>=20 >>>> I would suggest to run fio (or even dd) on raw disk device inside = VM, i.e. without filesystem at all. Just do not forget do =E2=80=9Cecho = 3 > /proc/sys/vm/drop_caches=E2=80=9D in Linux Guest VM before you run = tests.=20 >>> The two servers I was using to test with are are no longer = available. However, I'll have two more identical servers arriving in the = next week or so. I'll try to run additional tests and report back here. = I used bonnie++ as that was easily installed from the package repos on = all the systems I tested. >>>=20 >>>>=20 >>>> Could you also give more information about: >>>>=20 >>>> 1. What results did you get (decode bonnie++ output)? >>> If you look back at this email thread, there are many examples of = running bonnie++ on the guest. I first ran the tests on the host system = using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get a baseline of = performance. Then I ran bonnie++ tests using bhyve as the hypervisor and = Linux & FreeBSD as the guest. The combination of host and guest storage = options included ... >>>=20 >>> 1) block device + virtio blk >>> 2) block device + nvme >>> 3) UFS disk image + virtio blk >>> 4) UFS disk image + nvme >>> 5) ZFS disk image + virtio blk >>> 6) ZFS disk image + nvme >>> 7) ZVOL + virtio blk >>> 8) ZVOL + nvme >>>=20 >>> In every instance, I observed the Linux guest disk IO often perform = very well for some time after the guest was first booted. Then the = performance of the guest would drop to a fraction of the original = performance. The benchmark test was run every 5 or 10 minutes in a cron = job. Sometimes the guest would perform well for up to an hour before = performance would drop off. Most of the time it would only perform well = for a few cycles ( 10 - 30 mins ) before performance would drop off. The = only way to restore the performance was to reboot the guest. Once I = determined that the problem was not specific to a particular host or = guest storage option, I switched my testing to only use a block device = as backing storage on the host to avoid hitting any system disk caches. >>>=20 >>> Here is the test script I used in the cron job ... >>>=20 >>> #!/bin/sh >>> FNAME=3D'output.txt' >>>=20 >>> echo = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >> $FNAME >>> echo Begin @ `/usr/bin/date` >> $FNAME >>> echo >> $FNAME >>> /usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME >>> echo >> $FNAME >>> echo End @ `/usr/bin/date` >> $FNAME >>>=20 >>> As you can see, I'm calling bonnie++ with the system defaults. That = uses a data set size that's 2x the guest RAM in an attempt to minimize = the effect of filesystem cache on results. Here is an example of the = output that bonnie++ produces ... >>>=20 >>> Version 2.00 ------Sequential Output------ --Sequential = Input- --Random- >>> -Per Chr- --Block-- -Rewrite- -Per Chr- = --Block-- --Seeks-- >>> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec = %CP /sec %CP >>> linux-blk 63640M 694k 99 1.6g 99 737m 76 985k 99 1.3g = 69 +++++ +++ >>> Latency 11579us 535us 11889us 8597us 21819us = 8238us >>> Version 2.00 ------Sequential Create------ --------Random = Create-------- >>> linux-blk -Create-- --Read--- -Delete-- -Create-- = --Read--- -Delete-- >>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec = %CP /sec %CP >>> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ = +++ +++++ +++ >>> Latency 7620us 126us 1648us 151us 15us = 633us >>>=20 >>> --------------------------------- speed drop = --------------------------------- >>>=20 >>> Version 2.00 ------Sequential Output------ --Sequential = Input- --Random- >>> -Per Chr- --Block-- -Rewrite- -Per Chr- = --Block-- --Seeks-- >>> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec = %CP /sec %CP >>> linux-blk 63640M 676k 99 451m 99 314m 93 951k 99 402m = 99 15167 530 >>> Latency 11902us 8959us 24711us 10185us 20884us = 5831us >>> Version 2.00 ------Sequential Create------ --------Random = Create-------- >>> linux-blk -Create-- --Read--- -Delete-- -Create-- = --Read--- -Delete-- >>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec = %CP /sec %CP >>> 16 0 96 +++++ +++ +++++ +++ 0 96 +++++ = +++ 0 75 >>> Latency 343us 165us 1636us 113us 55us = 1836us >>>=20 >>> In the example above, the benchmark test repeated about 20 times = with results that were similar to the performance shown above the dotted = line ( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, the = performance dropped to what's shown below the dotted line which is less = than 1/4 the original speed ( ~ 451m/s seq write and 402m/s seq read ).=20= >>>=20 >>>> 2. What results expecting? >>>>=20 >>> What I expect is that, when I perform the same test with the same = parameters, the results would stay more or less consistent over time. = This is true when KVM is used as the hypervisor on the same hardware and = guest options. That said, I'm not worried about bhyve being consistently = slower than kvm or a FreeBSD guest being consistently slower than a = Linux guest. I'm concerned that the performance drop over time is = indicative of an issue with how bhyve interacts with non-freebsd guests. >>>=20 >>>> 3. VM configuration, virtio-blk disk size, etc. >>>> 4. Full command for tests (including size of test-set), bhyve, = etc. >>> I believe this was answered above. Please let me know if you have = additional questions. >>>=20 >>>>=20 >>>> 5. Did you pass virtio-blk as 512 or 4K ? If 512, probably you = should try 4K. >>>>=20 >>> The testing performed was not exclusively with virtio-blk. >>>=20 >>>=20 >>>> 6. Linux has several read-ahead options for IO schedule, and it = could be related too. >>>>=20 >>> I suppose it's possible that bhyve could be somehow causing the disk = scheduler in the Linux guest to act differently. I'll see if I can = figure out how to disable that in future tests. >>>=20 >>>=20 >>>> Additionally could also you play with =E2=80=9Csync=3Ddisabled=E2=80=9D= volume/zvol option? Of course it is only for write testing. >>> The testing performed was not exclusively with zvols. >>>=20 >>>=20 >>> Once I have more hardware available, I'll try to report back with = more testing. It may be interesting to also see how a Windows guest = performs compared to Linux & FreeBSD. I suspect that this issue may only = be triggered when a fast disk array is in use on the host. My tests use = a 16x SSD RAID 10 array. It's also quite possible that the disk IO = slowdown is only a symptom of another issue that's triggered by the disk = IO test ( please see end of my last post related to scheduler priority = observations ). All I can say for sure is that ... >>>=20 >>> 1) There is a problem and it's reproducible across multiple hosts >>> 2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests >>> 3) It is not specific to any host or guest storage option >>>=20 >>> Thanks, >>>=20 >>> -Matthew >>>=20 --Apple-Mail=_8118B541-F589-4E79-AF6C-3E98D8AADC93 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 28 Feb 2024, at 23:03, Matthew Grooms = <mgrooms@shrew.net> wrote:

...
The virtual disks were provisioned with either a = 128G disk image or a 1TB raw partition, so I don't think space was an = issue.

Trim is definitely not an issue. I'm using a tiny fraction of the = 32TB array have tried both heavily under-provisioned HW RAID10 and SW = RAID10 using GEOM. The latter was tested after sending full trim resets = to all drives individually.

It could be then = TRIM/UNMAP is not used, zvol (for the instance) becomes full for the = while. ZFS considers it as all blocks are used and write operations = could  have troubles. I believe it was recently = fixed.

Also look at this = one:

    = GuestFS->UNMAP->bhyve->Host-FS->PhysicalDisk

The problem of UNMAP that it could have unpredictable slowdown = at any time. So I would suggest to check results with enabled and = disabled UNMAP in a guest.

I will try to = incorporate the rest of your feedback into my next round of testing. If = I can find a benchmark tool that works with a raw block device, that = would be ideal.

Use =E2=80=9Cdd=E2=80=9D= as the first step for read testing;

  =  ~# dd if=3D/dev/nvme0n2 of=3D/dev/null bs=3D1M status=3Dprogress = flag=3Ddirect
   ~# dd if=3D/dev/nvme0n2 = of=3D/dev/null bs=3D1M status=3Dprogress

Compare = results with directio and without.

=E2=80=9Cfio=E2= =80=9D tool. 
 
  1) write = prepare:

      =  ~# fio  --name=3Dprep= --rw=3Dwrite --verify=3Dcrc32 --loop=3D1 = --numjobs=3D2  = --time_based --thread  --bs=3D1M = --iodepth=3D32  --ioengine=3Dlibaio --direct=3D1  --group_reporting  = --size=3D20G  --filename=3D/dev/nvme0n2


 2)  read test:

    =   ~# fio  --name=3Dreadtest --rw=3Dread =E2=80=94loop=3D30 = --numjobs=3D2  = --time_based --thread  =E2=80=94bs=3D256K --iodepth=3D32  --ioengine=3Dlibaio = --direct=3D1  --group_reporting  = --size=3D20G  --filename=3D/dev/nvme0n2
    =  
=E2=80=94
Vitaliy  

Thanks,

-Matthew


=E2=80=94=E2=80=94
Vitaliy

On 28 Feb 2024, at 21:29, Matthew Grooms <mgrooms@shrew.net> wrote:

On 2/27/24 04:21, Vitaliy Gusev = wrote:
Hi,


On 23 Feb 2024, at = 18:37, Matthew Grooms <mgrooms@shrew.net> wrote:

...
The problem occurs when an image file is = used on either ZFS or UFS. The problem also occurs when the virtual disk = is backed by a raw disk partition or a ZVOL. This issue isn't related to = a specific underlying = filesystem.


Do I = understand right, you ran testing inside VM inside guest VM  on = ext4 filesystem? If so you should be aware about additional overhead in = comparison when you were running tests on the = hosts.

Hi Vitaliy,

I = appreciate you providing the feedback and suggestions. I spent over a = week trying as many combinations of host and guest options as possible = to narrow this issue down to a specific host storage or a guest device = model option. Unfortunately the problem occurred with every combination = I tested while running Linux as the guest. Note, I only tested RHEL8 = & RHEL9 compatible distributions ( Alma & Rocky ). The problem = did not occur when I ran FreeBSD as the guest. The problem did not occur = when I ran KVM in the host and Linux as the guest.

I = would suggest to run fio (or even dd) on raw disk device inside VM, i.e. = without filesystem at all.  Just do not forget do =E2=80=9Cecho 3 > /proc/sys/vm/drop_caches=E2=80=9D in = Linux Guest VM before you run tests. 
=

The two servers I was using to test with are are no longer available. = However, I'll have two more identical servers arriving in the next week = or so. I'll try to run additional tests and report back here. I used = bonnie++ as that was easily installed from the package repos on all the = systems I tested.


=
Could you also give more information = about:

 1. What results did you get = (decode bonnie++ output)?

If you look back at = this email thread, there are many examples of running bonnie++ on the = guest. I first ran the tests on the host system using Linux + ext4 and = FreeBSD 14 + UFS & ZFS to get a baseline of performance. Then I ran = bonnie++ tests using bhyve as the hypervisor and Linux & FreeBSD as = the guest. The combination of host and guest storage options included = ...

1) block device + virtio blk
2) block device + nvme
3) = UFS disk image + virtio blk
4) UFS disk image + nvme
5) ZFS disk = image + virtio blk
6) ZFS disk image + nvme
7) ZVOL + virtio = blk
8) ZVOL + nvme

In every instance, I observed the Linux = guest disk IO often perform very well for some time after the guest was = first booted. Then the performance of the guest would drop to a fraction = of the original performance. The benchmark test was run every 5 or 10 = minutes in a cron job. Sometimes the guest would perform well for up to = an hour before performance would drop off. Most of the time it would = only perform well for a few cycles ( 10 - 30 mins ) before performance = would drop off. The only way to restore the performance was to reboot = the guest. Once I determined that the problem was not specific to a = particular host or guest storage option, I switched my testing to only = use a block device as backing storage on the host to avoid hitting any = system disk caches.

Here is the test script I used in the cron = job ...

#!/bin/sh
FNAME=3D'output.txt'

echo = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >> $FNAME
echo Begin @ `/usr/bin/date` >> = $FNAME
echo >> $FNAME
/usr/sbin/bonnie++ 2>&1 | = /usr/bin/grep -v 'done\|,' >> $FNAME
echo >> = $FNAME
echo End @ `/usr/bin/date` >> = $FNAME


As you can see, I'm calling bonnie++ with = the system defaults. That uses a data set size that's 2x the guest RAM = in an attempt to minimize the effect of filesystem cache on results. = Here is an example of the output that bonnie++ produces ...

Version  = 2.00       ------Sequential Output------ = --Sequential Input- = --Random-
          &= nbsp;         -Per Chr- --Block-- = -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size = etc        /sec %CP  /sec = %CP  /sec %CP  /sec %CP  /sec %CP  /sec = %CP
linux-blk    63640M  694k  99  = 1.6g  99  737m  76  985k  99  1.3g  = 69 +++++ = +++
Latency          =    11579us     535us   = 11889us    8597us   21819us    = 8238us
Version  2.00       = ------Sequential Create------ --------Random = Create--------
linux-blk        = ;   -Create-- --Read--- -Delete-- -Create-- --Read--- = -Delete--
          &= nbsp;   files  /sec %CP  = /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec = %CP
           &= nbsp;     16 +++++ +++ +++++ +++ = +++++ +++ +++++ +++ +++++ +++ +++++ = +++
Latency          =     7620us     = 126us    1648us     = 151us      15us     = 633us

--------------------------------- speed drop = ---------------------------------

Version  = 2.00       ------Sequential Output------ = --Sequential Input- = --Random-
          &= nbsp;         -Per Chr- --Block-- = -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size = etc        /sec %CP  /sec = %CP  /sec %CP  /sec %CP  /sec %CP  /sec = %CP
linux-blk    63640M  676k  99  = 451m  99  314m  93  951k  99  402m  = 99 15167 = 530
Latency          =    11902us    8959us   = 24711us   10185us   20884us    = 5831us
Version  2.00       = ------Sequential Create------ --------Random = Create--------
linux-blk        = ;   -Create-- --Read--- -Delete-- -Create-- --Read--- = -Delete--
          &= nbsp;   files  /sec %CP  = /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec = %CP
           &= nbsp;     16     = 0  96 +++++ +++ +++++ +++     0  96 +++++ = +++     0  = 75
Latency          &= nbsp;    343us     = 165us    1636us     = 113us      55us    = 1836us

In the example above, the benchmark test = repeated about 20 times with results that were similar to the = performance shown above the dotted line ( ~ 1.6g/s seq write and 1.3g/s = seq read ). After that, the performance dropped to what's shown below = the dotted line which is less than 1/4 the original speed ( ~ 451m/s seq = write and 402m/s seq read ). 

&nbs= p;2. What results expecting?

What I = expect is that, when I perform the same test with the same parameters, = the results would stay more or less consistent over time. This is = true when KVM is used as the hypervisor on the same hardware and guest = options. That said, I'm not worried about bhyve being consistently = slower than kvm or a FreeBSD guest being consistently slower than a = Linux guest. I'm concerned that the performance drop over time is = indicative of an issue with how bhyve interacts with non-freebsd = guests.

&nbs= p;3. VM configuration, virtio-blk disk size, etc.
 4. = Full command for tests (including size of test-set), bhyve, = etc.

I believe this was answered above. = Please let me know if you have additional questions.


=
 5. Did you pass virtio-blk as 512 or 4K ? If 512, = probably you should try 4K.

The = testing performed was not exclusively with = virtio-blk.

&nbs= p;6. Linux has several read-ahead options for IO schedule, and it could = be related too.

I suppose it's = possible that bhyve could be somehow causing the disk scheduler in the = Linux guest to act differently. I'll see if I can figure out how to = disable that in future tests.

Addi= tionally could also you play with =E2=80=9Csync=3Ddisabled=E2=80=9D = volume/zvol option? Of course it is only for write = testing.

The testing performed was not = exclusively with zvols.

Once I have more hardware = available, I'll try to report back with more testing. It may be = interesting to also see how a Windows guest performs compared to Linux = & FreeBSD. I suspect that this issue may only be triggered when a = fast disk array is in use on the host. My tests use a 16x SSD RAID 10 = array. It's also quite possible that the disk IO slowdown is only a = symptom of another issue that's triggered by the disk IO test ( please = see end of my last post related to scheduler priority observations ). = All I can say for sure is that ...

1) There is a problem and it's = reproducible across multiple hosts
2) It affects RHEL8 & RHEL9 = guests but not FreeBSD guests
3) It is not specific to any host or = guest storage = option

Thanks,

-Matthew


= --Apple-Mail=_8118B541-F589-4E79-AF6C-3E98D8AADC93--