From nobody Fri Feb 18 01:48:14 2022 X-Original-To: freebsd-geom@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C1BB519C0CD3; Fri, 18 Feb 2022 01:48:23 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gate2.funkthat.com", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4K0F2t6SDSz4b34; Fri, 18 Feb 2022 01:48:22 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (localhost [127.0.0.1]) by gold.funkthat.com (8.15.2/8.15.2) with ESMTPS id 21I1mEY9092328 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 17 Feb 2022 17:48:14 -0800 (PST) (envelope-from jmg@gold.funkthat.com) Received: (from jmg@localhost) by gold.funkthat.com (8.15.2/8.15.2/Submit) id 21I1mE6u092327; Thu, 17 Feb 2022 17:48:14 -0800 (PST) (envelope-from jmg) Date: Thu, 17 Feb 2022 17:48:14 -0800 From: John-Mark Gurney To: Peter Jeremy Cc: FreeBSD FS , "freebsd-geom@FreeBSD.org" Subject: Re: bio re-ordering Message-ID: <20220218014814.GJ97875@funkthat.com> Mail-Followup-To: Peter Jeremy , FreeBSD FS , "freebsd-geom@FreeBSD.org" References: <9848cde6-5c12-cdd4-e722-42fe26fa0349@FreeBSD.org> List-Id: GEOM-specific discussions and implementations List-Archive: https://lists.freebsd.org/archives/freebsd-geom List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-geom@freebsd.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Md/poaVZ8hnGTzuv" Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD 11.3-STABLE amd64 X-PGP-Fingerprint: D87A 235F FB71 1F3F 55B7 ED9B D5FF 5A51 C0AC 3D65 X-Files: The truth is out there X-URL: https://www.funkthat.com/ X-Resume: https://www.funkthat.com/~jmg/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? User-Agent: Mutt/1.6.1 (2016-04-27) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (gold.funkthat.com [127.0.0.1]); Thu, 17 Feb 2022 17:48:15 -0800 (PST) X-Rspamd-Queue-Id: 4K0F2t6SDSz4b34 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of jmg@gold.funkthat.com has no SPF policy when checking 208.87.223.18) smtp.mailfrom=jmg@gold.funkthat.com X-Spamd-Result: default: False [-3.88 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[jmg]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; DMARC_NA(0.00)[funkthat.com]; AUTH_NA(1.00)[]; R_SPF_NA(0.00)[no SPF record]; MID_RHS_MATCH_FROM(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; NEURAL_HAM_SHORT(-0.98)[-0.983]; MLMMJ_DEST(0.00)[freebsd-fs,freebsd-geom]; FORGED_SENDER(0.30)[jmg@funkthat.com,jmg@gold.funkthat.com]; SIGNED_PGP(-2.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:32354, ipnet:208.87.216.0/21, country:US]; FROM_NEQ_ENVFROM(0.00)[jmg@funkthat.com,jmg@gold.funkthat.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N --Md/poaVZ8hnGTzuv Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Peter Jeremy wrote this message on Sat, Feb 05, 2022 at 20:50 +1100: > On 2022-Feb-02 11:49:44 +0200, Andriy Gapon wrote: > >On 02/02/2022 11:14, Warner Losh wrote: > >> On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon >> > wrote: > >> Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH = command > >> without BIO_ORDERED flag.=A0 Not sure if it happens to do the righ= t thing anyway > >> or not. > >>=20 > >>=20 > >> It's an unordered flush then. The flush will happen whenever. I have a= vague > >> memory that ZFS will only issue this command in cases where there's no= other I/O > >> pending. > > > >I think that there is still a potential problem that an earlier write re= quest=20 > >might get re-ordered after the flush. > >I think that we should add BIO_ORDERED for correctness. >=20 > I've raised https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261731 to > make geom_gate support BIO_ORDERED. Exposing the BIO_ORDERED flag to > userland is quite easy (once a decision is made as to how to do that). > Enhancing the geom_gate clients to correctly implement BIO_ORDERED is > somewhat harder. The clients are single threaded wrt IOs, so I don't think updating them are required. I do have patches to improve things by making ggated multithreaded to improve IOPs, and so making this improvement would allow those patches to be useful. I do have a question though, what is the exact semantics of _ORDERED? Does all the previous IOs have to be ack'd/received by the kernel before executing them, OR can once ggated, for example, received notification that the writes before an _ORDERED completes, that it can then execute the _ORDERED command w/o the other side receiving it? The reason I ask, is that if the connection is broken before the kernel ack's the pre-_ORDERED bios, but after the _ORDERED bio has been written, what are the implications? I can think of an issue where the pre and _ORDERED bio is overlapping that might cause issue. Here is the scenario that I'm thinking of. _WRITE 16 sectors at offset 0 _WRITE _ORDERED 16 sectors at offset 8 connection is now broken ggate reconnects kernel reissues both IOs. _WRITE 16 sectors at offset 0 kernel crashes before the second _WRITE happens and needs to read the data. We now have a situation where sectors 16-24 have "new" data, while sectors 8-16 have "old" data on them, which may corrupt what a FS thinks. And right now, the ggate protocol (from what I remember) doesn't have a way to know when the remote kernel has received notification that an IO is complete. I guess this situation isn't any worse than it is right now w/o passing the _ORDERED flag down though. > I've done some experiments and OpenZFS doesn't generate BIO_ORDERED > operations so I've also raised https://github.com/openzfs/zfs/issues/13065 > I haven't looked into how difficult that would be to fix. --=20 John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." --Md/poaVZ8hnGTzuv Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJiDvrdXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ2MEI1RTRGMTNDNzYyMDZDNjEyMDBCNjAy MDVGMEIzM0REMDA2QURBAAoJECBfCzPdAGraMXQP+wYeZjbb7MhdsnrY5nkmPzlY IUdJgEuU1obovHyakyUrhRLaRmnseyQriRtCm0kBgbcn2+hrq1CCA6+5fqifOfnX 9LS52440vXSbpQn9fybLNKcBLVZiaunqkG9NuuQEJO+b1Svdvfafz3EddH35xLMd ITxWh3uzEFYra/tsAZjZLfC1D3nbEKJt1WaEMINu+x6Chw8v9u3Gd+yUR+C51aVi 2K1JD/oEFBplB5uKBrMm4Cl/xBjoDwoOCsInWCR9D+YDrmLopZ0Ssj6GMO4HHFxA Lr+VWRGaY6Vx/2u48bTcxaye/TIMkc94wLeqFa32pIYdC/fSRWz71O+cJcupj0DD KOgmldm819FZPjT8+yq28nX4YptyU5YDxH8Un+z7a98AbqP7pfQ8sx4tmJhxgVZM OddFW9VrGXOLGYSqL1J3ILvZmN+WUhWtt4ffSLfWT3iZhX1qCuoYrPu0Wt5I1QYa x3E3zFF8KHlFwq8hU2EMOxrDZYKqhEW1umq81mifVKRmvYf/6hpDiij11CVf3mfw 8yZjYnu+4hFCYJoXTSKh9GYue80eLFUBNIpM9bXPphzUIng4uQEG2AjhsCMkzTS/ 7cejdHcgDT8xDYnKu8/+QD0w9ehDj7shT9lQFjcIMpqUtr9YVre6OR3vK5kCs9tk W85q5bZU+rZF2I8/70IH =GiPK -----END PGP SIGNATURE----- --Md/poaVZ8hnGTzuv--