Re: bio re-ordering

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 02 Feb 2022 09:14:40 UTC
On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon <avg@freebsd.org> wrote:

> On 02/02/2022 09:58, Warner Losh wrote:
> >
> >
> > On Wed, Feb 2, 2022, 12:49 AM Peter Jeremy <peterj@freebsd.org
> > <mailto:peterj@freebsd.org>> wrote:
> >
> >     Thanks all for the very prompt responses.
> >
> >     On 2022-Jan-28 22:32:02 -0700, Warner Losh <imp@bsdimp.com
> >     <mailto:imp@bsdimp.com>> wrote:
> >      >I think that ufs relies on two ordering primitives, both marked
> with
> >      >BIO_ORDERED today.
> >      >That's what most of the drivers key off of. We always set
> BIO_ORDERED on
> >      >all the BIO_FLUSH
> >      >events as far as I Can tell.
> >
> >     Thanks for that warning.  I don't think geom_gate understands either
> >     B_BARRIER or BIO_ORDERED.  I shall have a closer look.
> >
> >
> > It needs to understand BIO_ORDERED.
> >
> >
> >      >to it. b*barrierwrite() sets this, and that's used in the
> ffs_alloc code.
> >
> >     In my case, I'm interested in ZFS, rather than UFS and it doesn't
> seem
> >     to set B_BARRIER or BIO_ORDERED or indirectly.
> >
> >
> > I went hunting ZFS for this year's ago and in the pre OpenZFS code they
> were
> > used, but there were three layers of indirection that obscured it. ZFS
> doesn't
> > use the buffer cache, so B_BARRIER isn't relevant. I'll see if I can
> find it
> > with the new code.
> >
> > But if it never sets BIO_ORDERED, drivers are already reordering things.
> That's
> > all any other driver in the tree worries about...
>
> Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH command
> without BIO_ORDERED flag.  Not sure if it happens to do the right thing
> anyway
> or not.
>

It's an unordered flush then. The flush will happen whenever. I have a vague
memory that ZFS will only issue this command in cases where there's no
other I/O
pending. It will be the only way for it to be reliable with nvme, since the
BIO_FLUSH
command isn't ordered w/o BIO_ORDERED flag. So ggate needn't do anything
special for BIO_FLUSH, just BIO_ORDERED. Otherwise, it's free to reorder as
it
sees fit.

The CAM I/O scheduler takes a little bit of liberty here, btw. It
interprets BIO_ORDERED
as being only wrt BIO_WRITE and BIO_FLUSH because if you schedule both a
read
and write, the results are undefined. nvd takes a stricter approach and
honors the ordering
more strictly.

Warner