From nobody Fri Feb 18 16:31:13 2022
X-Original-To: freebsd-geom@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 59E3219CE123
	for <freebsd-geom@mlmmj.nyi.freebsd.org>; Fri, 18 Feb 2022 16:31:33 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Received: from mail-ua1-x936.google.com (mail-ua1-x936.google.com [IPv6:2607:f8b0:4864:20::936])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4K0cdw3YQbz3qfQ
	for <freebsd-geom@freebsd.org>; Fri, 18 Feb 2022 16:31:32 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Received: by mail-ua1-x936.google.com with SMTP id g18so4534523uak.5
        for <freebsd-geom@freebsd.org>; Fri, 18 Feb 2022 08:31:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bsdimp-com.20210112.gappssmtp.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=DfM7FmnDyvj93Hq+rlHbfv7sKthuIm5IHBzHrT4W0X0=;
        b=aZiMq2WJWbQO4PiGsD7cYVQTbdTI8dOjF4/BLS7SJqpCfmkbZ/wNana3jcEB2v7BhJ
         kmZ/5N7/Puea65hyCy8VQK8YTyLMPD0+7Jcz0QjgPowoz2OZC6pl/Oc8robxCs6wtpwL
         ArPhfIbU6ijb6wxnaJYOGUVLNLwGfNXxcnPuZUj+MWLkOzMoL9Wacg9jbv8QlGmWyqOB
         +bUVAjn7vXGuCpufcNQjsAGjUSM5eiZmvnLm21Kd/obqQAOf4TTg3dK2tsK77x9fOMsh
         fTgRLqTfimUZRkO3nsw7Sr770nI4SSHMaThVWD9aE9wCD6+pcBK9Iy1Dts9TBmVNe8Lk
         OUhA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=DfM7FmnDyvj93Hq+rlHbfv7sKthuIm5IHBzHrT4W0X0=;
        b=Qv5OnaemXl5zsP21peFJqcGi/5MHbXJUUy2CccaMQ8NqRn4t9Wq75mlbmorydYLKT+
         9dzXBcFc+MtBeU+oUo/1I1Pv93IwBNNwoNuuLEAEbVHXp7HjFhrsuW+qYbyM8p0dytHU
         LtoPgwIYAqvmjoI/HkM+7llqVjHs9IejPx5LiIo/2gN59F75fBrBSlqTn1fLUu5u6Vrr
         61Y47FeMBu07HzTn8YP+e4u3zM3gd5Yvgu8XnBCxL/p1ueJ9+ljHf8ku3FDyxFbjmJrR
         wD5V6vxvK+jG/03yNFPs3tWH0I2xAkvQO9p1LLO9juK2b2AAYtpPiJJKDbRtoR2NTWxF
         K1NA==
X-Gm-Message-State: AOAM531mQNuUZSdJoWIPqf8BJSqL/leFxqau3z2+izawHvMDPVv3zNIQ
	PkdF25EqT86W6kIgszHybEZOxmXSkCSs3URBfwQIMA==
X-Google-Smtp-Source: ABdhPJztBpdSw7moYd7bY191fRxYYL1rvkRZTpRl9cZUrHzwc0KMTNjBr34k2DaCOHk+bYCsUUd5twk3sv7ipaL+7ww=
X-Received: by 2002:ab0:1530:0:b0:33c:53cf:6844 with SMTP id
 o45-20020ab01530000000b0033c53cf6844mr3266746uae.85.1645201885697; Fri, 18
 Feb 2022 08:31:25 -0800 (PST)
List-Id: GEOM-specific discussions and implementations <freebsd-geom.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-geom
List-Help: <mailto:freebsd-geom+help@freebsd.org>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Subscribe: <mailto:freebsd-geom+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-geom+unsubscribe@freebsd.org>
Sender: owner-freebsd-geom@freebsd.org
MIME-Version: 1.0
References: <YfTCs7j3TPZFcFCD@server.rulingia.com> <YfTEj1KLhQhoR3xP@kib.kiev.ua>
 <CANCZdfoqQ3Ze+cMTsk_ho9x8hsSM9=fTavSao+Utwc2nSAEJpQ@mail.gmail.com>
 <Yfo3i9Yy/uCUpss1@server.rulingia.com> <CANCZdfqBQOvzMCrJxWq9GzqCKyK_AubBE1CxAW5FULnE7D_jrg@mail.gmail.com>
 <b75872f4-521b-5eab-68d0-4b1c04a10add@FreeBSD.org> <CANCZdfp=0rbBkr4SoXhvn7hrQniPQzTeZra2HGBwXDGsJjN8XQ@mail.gmail.com>
 <9848cde6-5c12-cdd4-e722-42fe26fa0349@FreeBSD.org> <Yf5IUCWW/tgI/Cse@server.rulingia.com>
 <20220218014814.GJ97875@funkthat.com> <Yg9agkeypdDOwKWm@server.rulingia.com>
In-Reply-To: <Yg9agkeypdDOwKWm@server.rulingia.com>
From: Warner Losh <imp@bsdimp.com>
Date: Fri, 18 Feb 2022 09:31:13 -0700
Message-ID: <CANCZdfp_6KNUpxNe9yp5QR9K-5qM9ez+LW=sGAPJ72yHYvH6tg@mail.gmail.com>
Subject: Re: bio re-ordering
To: Peter Jeremy <peterj@freebsd.org>
Cc: FreeBSD FS <freebsd-fs@freebsd.org>, 
	"freebsd-geom@FreeBSD.org" <freebsd-geom@freebsd.org>
Content-Type: multipart/alternative; boundary="000000000000fb45dd05d84d69e2"
X-Rspamd-Queue-Id: 4K0cdw3YQbz3qfQ
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=aZiMq2WJ;
	dmarc=none;
	spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::936) smtp.mailfrom=wlosh@bsdimp.com
X-Spamd-Result: default: False [-2.89 / 15.00];
	 TO_DN_EQ_ADDR_SOME(0.00)[];
	 ARC_NA(0.00)[];
	 R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112];
	 NEURAL_HAM_MEDIUM(-1.00)[-0.996];
	 FROM_HAS_DN(0.00)[];
	 RCPT_COUNT_THREE(0.00)[3];
	 TO_DN_SOME(0.00)[];
	 NEURAL_HAM_LONG(-0.90)[-0.905];
	 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
	 PREVIOUSLY_DELIVERED(0.00)[freebsd-geom@freebsd.org];
	 DMARC_NA(0.00)[bsdimp.com];
	 TO_MATCH_ENVRCPT_SOME(0.00)[];
	 DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+];
	 NEURAL_HAM_SHORT(-0.99)[-0.993];
	 RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::936:from];
	 MLMMJ_DEST(0.00)[freebsd-geom];
	 FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com];
	 R_SPF_NA(0.00)[no SPF record];
	 MIME_TRACE(0.00)[0:+,1:+,2:~];
	 ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
	 FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com];
	 RCVD_TLS_ALL(0.00)[];
	 RCVD_COUNT_TWO(0.00)[2]
X-ThisMailContainsUnwantedMimeParts: N

--000000000000fb45dd05d84d69e2
Content-Type: text/plain; charset="UTF-8"

So I spent some time looking at what BIO_ORDERED means in today's kernel
and flavored it with my indoctrination of the ordering guarantees with BIO
requests
from when I wrote the CAM I/O scheduler. it's kinda long, but spells out
what
BIO_ORDERED means, where it can come from and who depends on it for what.

On Fri, Feb 18, 2022 at 1:36 AM Peter Jeremy <peterj@freebsd.org> wrote:

> On 2022-Feb-17 17:48:14 -0800, John-Mark Gurney <jmg@funkthat.com> wrote:
> >Peter Jeremy wrote this message on Sat, Feb 05, 2022 at 20:50 +1100:
> >> I've raised https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261731 to
> >> make geom_gate support BIO_ORDERED.  Exposing the BIO_ORDERED flag to
> >> userland is quite easy (once a decision is made as to how to do that).
> >> Enhancing the geom_gate clients to correctly implement BIO_ORDERED is
> >> somewhat harder.
> >
> >The clients are single threaded wrt IOs, so I don't think updating them
> >are required.
>
> ggatec(8) and ggated(8) will not reorder I/Os.  I'm not sure about hast.
>
> >I do have patches to improve things by making ggated multithreaded to
> >improve IOPs, and so making this improvement would allow those patches
> >to be useful.
>
> Likewise, I found ggatec and ggated to be too slow for my purposes and
> so I've implemented my own variant (not network API compatible) that
> can/does reorder requests.  That was when I noticed that BIO_ORDERED
> wasn't implemented.
>
> >I do have a question though, what is the exact semantics of _ORDERED?
>
> I can't authoritatively answer this, sorry.
>

This is under documented. Clients, in general, are expected to cope with
I/O that completes in an arbitrary order. They are expected to not schedule
new I/O that depends on old I/O completing for whatever reason (usually
on-media consistency). BIO_ORDERED is used to create a full barrier
in the stream of I/Os. The comments in the code say vaguely:

/*
 * This bio must be executed after all previous bios in the queue have been
 * executed, and before any successive bios can be executed.
 */

Drivers implement this as a partitioning of requests. All requests before
it are completed, then the BIO_ORDERED operation is done, then requests
after it are scheduled with the device.

BIO_FLUSH I think is the only remaining operation that's done as BIO_ORDERED
directly. xen.../blkback.c, geom_io.c and ffs_softdep.c are the only ones
that set it
and all on BIO_FLUSH operations. bio/buf clients depend on this to ensure
metadata
on the drive is in a consistent state after it's been updated.

xen/.../blkback.c also sets it for all BLKIF_OP_WRITE_BARRIER operations (so
write barriers).

In the upper layers, we have struct buf instead of struct bio to describe
future I/Os
that the buffer cache may need to do. There's a flag B_BARRIER that gets
turned
into BIO_ORDERED in geom_vfs. B_BARRIER is set in only two places (and
copied
in one other) in vfs_bio.c. babarrierwrite and bbarrierwrite for async vs
sync writes
respectively.

CAM will set BIO_ORDERED for all BIO_ZONE commands for reasons that are
at best unclear to me, but which won't matter for this discussion.

ffs_alloc.c (so UFS again) is the only place that uses babarrierwrite. It
is used
to ensure that all inode initializations are completed before the cylinder
group
bitmap is written out. This is done with newfs, when new cylinder groups are
created with growfs, and apparently in a few other cases where additional
inodes
are created in newly-created UFS2 filesystems. This can be disabled with
vfs.ffs.doasyncinodeinit=0 when barrier writes aren't working as advertised,
but there's a big performance hit from doing so until all the inodes for the
filesystem have been lazily populated.

No place uses bbarrierwrite that I can find.

Based on all of that, the CAM's dynamic I/O scheduler will reorder reads
around a BIO_ORDERED operation, but not writes, trims or flushes. Since,
in general, operations happen in an arbitrary order, scheduling both a read
and a write at the same time for the same block will result in undefined
results.

Different drivers handle this differently. CAM will honor the BIO_ORDERED
tag by scheduling the I/O with an ordering tag so that the SCSI hardware
will
properly order the result. The simpler ATA version will use a non NCQ
request
to force the proper ordering (since to send a non-NCQ request, you have to
drain the queue, do that one command, and then start up again). nvd will
just throw
the I/O at the device, until it encounters a BIO_ORDERED request. Then it
will queue
everything until all the current requests complete, then do the ordered
request, then
do the rest of the queued I/O as if it had just showed up.

Most drivers use bioq_disksort(), which will queue the request to the end
of the bioq
and mark things so all I/Os after that are in their new 'elevator car' for
its elevator sort
algorithm. This means that CAM's normal ways of dequeuing the request will
preserve
ordering through the periph driver's start routine (where the dynamic
schedule will honor
it for writes, but not reads, but the default scheduler will honor it for
both).


> >And right now, the ggate protocol (from what I remember) doesn't have
> >a way to know when the remote kernel has received notification that an
> >IO is complete.
>
> A G_GATE_CMD_START write request will be sent to the remote system and
> issued as a pwrite(2) then an acknowledgement packet will be returned
> and passed back to the local kernel via G_GATE_CMD_DONE.  There's no
> support for BIO_FLUSH or BIO_ORDERED so there's no way for the local
> kernel to know when the write has been written to non-volatile store.
>

That's unfortunate. UFS can work around the BIO_ORDERED problem with
a simple setting, but not the BIO_FLUSH problem.


> >> I've done some experiments and OpenZFS doesn't generate BIO_ORDERED
> >> operations so I've also raised
> https://github.com/openzfs/zfs/issues/13065
> >> I haven't looked into how difficult that would be to fix.
>
> Unrelated to the above but for completeness:  OpenZFS avoids the need
> for BIO_ORDERED by not issuing additional I/Os until previous I/Os have
> been retired when ordering is important.  (It does rely on BIO_FLUSH).
>

To be clear: OpenZFS won't schedule new I/Os until the BIO_FLUSH it sends
down w/o the BIO_ORDERED flag completes, right? The parenthetical confuses
me on how to parse it: BIO_FLUSH is needed and ZFS depends on it completing
with all blocks flushed to stable media, or ZFS depends on BIO_FLUSH being
strongly ordered relative to other commands. I think you mean the former,
but want
to make sure.

The root of this problem, I think, is the following:
     % man 9 bio
     No manual entry for bio
I think I'll have to massage this email into an appropriate man page.

At the very least, I should turn some/all of the above into a blog post :)

Warner

--000000000000fb45dd05d84d69e2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>So I spent some time looking at what BIO_ORDERED mean=
s in today&#39;s kernel</div><div>and flavored it with my indoctrination of=
 the ordering guarantees with BIO requests</div><div>from when I wrote the =
CAM I/O scheduler. it&#39;s kinda long, but spells out what</div><div>BIO_O=
RDERED means, where it can come from and who depends on it for what.</div><=
br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri,=
 Feb 18, 2022 at 1:36 AM Peter Jeremy &lt;<a href=3D"mailto:peterj@freebsd.=
org">peterj@freebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,=
204);padding-left:1ex">On 2022-Feb-17 17:48:14 -0800, John-Mark Gurney &lt;=
<a href=3D"mailto:jmg@funkthat.com" target=3D"_blank">jmg@funkthat.com</a>&=
gt; wrote:<br>
&gt;Peter Jeremy wrote this message on Sat, Feb 05, 2022 at 20:50 +1100:<br=
>
&gt;&gt; I&#39;ve raised <a href=3D"https://bugs.freebsd.org/bugzilla/show_=
bug.cgi?id=3D261731" rel=3D"noreferrer" target=3D"_blank">https://bugs.free=
bsd.org/bugzilla/show_bug.cgi?id=3D261731</a> to<br>
&gt;&gt; make geom_gate support BIO_ORDERED.=C2=A0 Exposing the BIO_ORDERED=
 flag to<br>
&gt;&gt; userland is quite easy (once a decision is made as to how to do th=
at).<br>
&gt;&gt; Enhancing the geom_gate clients to correctly implement BIO_ORDERED=
 is<br>
&gt;&gt; somewhat harder.<br>
&gt;<br>
&gt;The clients are single threaded wrt IOs, so I don&#39;t think updating =
them<br>
&gt;are required.<br>
<br>
ggatec(8) and ggated(8) will not reorder I/Os.=C2=A0 I&#39;m not sure about=
 hast.<br>
<br>
&gt;I do have patches to improve things by making ggated multithreaded to<b=
r>
&gt;improve IOPs, and so making this improvement would allow those patches<=
br>
&gt;to be useful.<br>
<br>
Likewise, I found ggatec and ggated to be too slow for my purposes and<br>
so I&#39;ve implemented my own variant (not network API compatible) that<br=
>
can/does reorder requests.=C2=A0 That was when I noticed that BIO_ORDERED<b=
r>
wasn&#39;t implemented.<br>
<br>
&gt;I do have a question though, what is the exact semantics of _ORDERED?<b=
r>
<br>
I can&#39;t authoritatively answer this, sorry.<br></blockquote><div><br></=
div><div>This is under documented. Clients, in general, are expected to cop=
e with</div><div>I/O that completes in an arbitrary order. They are expecte=
d to not schedule</div><div>new I/O that depends on old I/O completing for =
whatever reason (usually</div><div>on-media consistency). BIO_ORDERED is us=
ed to create a full barrier</div><div>in the stream of I/Os. The comments i=
n the code say vaguely:</div><div><br></div><div>/*<br>=C2=A0* This bio mus=
t be executed after all previous bios in the queue have been<br>=C2=A0* exe=
cuted, and before any successive bios can be executed.<br>=C2=A0*/<br></div=
><div><br></div><div>Drivers implement this as a partitioning of requests. =
All requests before</div><div>it are completed, then the BIO_ORDERED operat=
ion is done, then requests</div><div>after it are scheduled with the device=
.</div><div><br></div><div>BIO_FLUSH I think is the only remaining operatio=
n that&#39;s done as BIO_ORDERED</div><div>directly. xen.../blkback.c, geom=
_io.c and ffs_softdep.c are the only ones that set it</div><div>and all on =
BIO_FLUSH operations. bio/buf clients depend on this to ensure metadata</di=
v><div>on the drive is in a consistent state after it&#39;s been updated.</=
div><div><br></div><div>xen/.../blkback.c also sets it for all=C2=A0BLKIF_O=
P_WRITE_BARRIER operations (so</div><div>write barriers).</div><div><br></d=
iv><div>In the upper layers, we have struct buf instead of struct bio to de=
scribe future I/Os<br></div><div>that the buffer cache may need to do. Ther=
e&#39;s a flag B_BARRIER that gets turned</div><div>into BIO_ORDERED in geo=
m_vfs. B_BARRIER is set in only two places (and copied</div><div>in one oth=
er) in vfs_bio.c.=C2=A0babarrierwrite and=C2=A0bbarrierwrite for async vs s=
ync writes</div><div>respectively.</div><div><br></div><div>CAM will set BI=
O_ORDERED for all BIO_ZONE commands for reasons that are</div><div>at best =
unclear to me, but which won&#39;t matter for this discussion.</div><div><b=
r></div><div>ffs_alloc.c (so UFS again) is the only place that uses=C2=A0ba=
barrierwrite. It is used</div><div>to ensure that all inode initializations=
 are completed before the cylinder group</div><div>bitmap is written out. T=
his is done with newfs, when new cylinder groups are</div><div>created with=
 growfs, and apparently in a few other cases where additional inodes</div><=
div>are created in newly-created UFS2 filesystems. This can be disabled wit=
h</div><div>vfs.ffs.doasyncinodeinit=3D0 when barrier writes aren&#39;t wor=
king as advertised,</div><div>but there&#39;s a big performance hit from do=
ing so until all the inodes for the</div><div>filesystem have been lazily p=
opulated.</div><div><br></div><div>No place uses bbarrierwrite that I can f=
ind.</div><div><br></div><div>Based on all of that, the CAM&#39;s dynamic I=
/O scheduler will reorder reads</div><div>around a BIO_ORDERED operation, b=
ut not writes, trims or flushes. Since,</div><div>in general, operations ha=
ppen in an arbitrary order, scheduling both a read</div><div>and a write at=
 the same time for the same block will result in undefined</div><div>result=
s.</div><div><br></div><div>Different drivers handle this differently. CAM =
will honor the BIO_ORDERED</div><div>tag by scheduling the I/O with an orde=
ring tag so that the SCSI hardware will</div><div>properly order the result=
. The simpler ATA version will use a non NCQ request</div><div>to force the=
 proper ordering (since to send a non-NCQ request, you have to</div><div>dr=
ain the queue, do that one command, and then start up again). nvd will just=
 throw</div><div>the I/O at the device, until it encounters a BIO_ORDERED r=
equest. Then it will queue</div><div>everything until all the current reque=
sts complete, then do the ordered request, then</div><div>do the rest of th=
e queued I/O as if it had just showed up.</div><div><br></div><div>Most dri=
vers use bioq_disksort(), which will queue the request to the end of the bi=
oq</div><div>and mark things so all I/Os after that are in their new &#39;e=
levator car&#39; for its elevator sort</div><div>algorithm. This means that=
 CAM&#39;s normal ways of dequeuing the request will preserve</div><div>ord=
ering through the periph driver&#39;s start routine (where the dynamic sche=
dule will honor</div><div>it for writes, but not reads, but the default sch=
eduler will honor it for both).</div><div>=C2=A0</div><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20=
4,204,204);padding-left:1ex">
&gt;And right now, the ggate protocol (from what I remember) doesn&#39;t ha=
ve<br>
&gt;a way to know when the remote kernel has received notification that an<=
br>
&gt;IO is complete.<br>
<br>
A G_GATE_CMD_START write request will be sent to the remote system and<br>
issued as a pwrite(2) then an acknowledgement packet will be returned<br>
and passed back to the local kernel via G_GATE_CMD_DONE.=C2=A0 There&#39;s =
no<br>
support for BIO_FLUSH or BIO_ORDERED so there&#39;s no way for the local<br=
>
kernel to know when the write has been written to non-volatile store.<br></=
blockquote><div><br></div><div>That&#39;s unfortunate. UFS can work around =
the BIO_ORDERED problem with</div><div>a simple setting, but not the BIO_FL=
USH problem.</div><div>=C2=A0<br></div><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padd=
ing-left:1ex">
&gt;&gt; I&#39;ve done some experiments and OpenZFS doesn&#39;t generate BI=
O_ORDERED<br>
&gt;&gt; operations so I&#39;ve also raised <a href=3D"https://github.com/o=
penzfs/zfs/issues/13065" rel=3D"noreferrer" target=3D"_blank">https://githu=
b.com/openzfs/zfs/issues/13065</a><br>
&gt;&gt; I haven&#39;t looked into how difficult that would be to fix.<br>
<br>
Unrelated to the above but for completeness:=C2=A0 OpenZFS avoids the need<=
br>
for BIO_ORDERED by not issuing additional I/Os until previous I/Os have<br>
been retired when ordering is important.=C2=A0 (It does rely on BIO_FLUSH).=
<br></blockquote><div><br></div><div>To be clear: OpenZFS won&#39;t schedul=
e new I/Os until the BIO_FLUSH it sends</div><div>down w/o the BIO_ORDERED =
flag completes, right? The parenthetical confuses</div><div>me on how to pa=
rse it: BIO_FLUSH is needed and ZFS depends on it completing</div><div>with=
 all blocks flushed to stable media, or ZFS depends on BIO_FLUSH being</div=
><div>strongly ordered relative to other commands. I think you mean the for=
mer, but want</div><div>to make sure.</div><div><br></div><div>The root of =
this problem, I think, is the following:</div><div>=C2=A0 =C2=A0 =C2=A0% ma=
n 9 bio</div>=C2=A0 =C2=A0 =C2=A0No manual entry for bio</div><div class=3D=
"gmail_quote">I think I&#39;ll have to massage this email into an appropria=
te man page.<br></div><div class=3D"gmail_quote"><br></div><div class=3D"gm=
ail_quote">At the very least, I should turn some/all of the above into a bl=
og post :)</div><div class=3D"gmail_quote"><div><br></div><div>Warner</div>=
</div></div>

--000000000000fb45dd05d84d69e2--