From nobody Sat Apr 15 20:25:56 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PzPwR205dz451N2 for ; Sat, 15 Apr 2023 20:26:15 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic313-20.consmr.mail.gq1.yahoo.com (sonic313-20.consmr.mail.gq1.yahoo.com [98.137.65.83]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4PzPwP4rXvz3wkF for ; Sat, 15 Apr 2023 20:26:13 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681590372; bh=dzamcM83+dMvkGB+6hTezxhTVemFsAK40GYmtefBGso=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=BZFJOgKBndDRvM8noHywJd2j81lTTJfbdmOW1vw+3ytVThpcupIjFJb7GOFDvy2VkRKluo+1w795Q81XFLqJrb/thoTHmkDYKLUhGzY5ixJmT6ssEZpvJEt5hMqRETEVkF8eOdECjpkx0JrVsvF1r2ss1ewCxb0LOFXo7J97Ti4xncdd+HAwbnu04gol5dRyFWRxrrV48sibLy14G65/jAU8YQ4XAA1IWT8MS3ZROqtsMJ/FUWbAK+rLlUFIbsd81usA4+CfC7RFqwWErqMf6GArwqF4UNexaNVvALEEw1qKcR3jUt0glFW4lK+6eyRD8UCrxlP1Ww5NWtN8FCL9ug== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681590372; bh=77zPJ87CMjHMihv9mGIcc5bygCMVuqX81GZdXWBfEC+=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=Zf0uRyRyHPzarARNjvuQDrtvBEXgTo0A881CjHMkau3CgwFO89sQzwZtny+zeRtsZ6RjhZQ4Fr06v2oUUgg9N5Umv61lYwozNMbAetn3pHr8dL9mwvOLe+TMc0hnV8RJN0aXdyDHqrFs8ZrSrAZhEkeIi1JV8RFv6JnH3R7UJeBI+tBJ3HC3UTuVPAN8v/U0Rhr4TBhIa201YFHjyZcFQD+Fsj9sMVDL59Rnl/G2iCIiqPWPfeE6aX3uYxByy/+q47wZZd69QylocslGIwrsVIH7alQYfSH+pMnQKSwremLJGt25aO90QpfbYRsrSePDaB2TdHb9e/QemhQIJb1oCA== X-YMail-OSG: k3TIrwAVM1nI59W9wUcQNSUpqWdgf3mdkM.cuYsue9VO.n4rhcxnfu6w9.jdJow Wg6XJUwQWGj0teCWQ0TrwyfA.QY6Xq14HsyjIizkQAbefinLAeifmqzkNteF8IGKqL5.jrcOme9L 7.EIOQ0ZN7Ig.OBYsjqI8EY1qtu1oKz0JznJWOSusfea63ib.tEPbKQnJFv9.5BOlJAsOzkLFQTY l._lMXeu84F1BBEoK83pBe5GfwXaG8nErvqC_CkFQFFhboLLcVPAEpk8P3zTSEyc9zjwMFuFbT_d Af4cfMNSnFcFABT4wmN9FdLap8FfbukqzdeASGhAGXgVuWVRxZQ_WCjpFdO7u04hLLr06.xoSusB LwT7cT7gSWVe_3GJl6QhdJ4Kj8YUluvO1JwOGvbqTtPkP..VxQbJ13VcBVWX1jJQaXodiJq8msHK bEV0.fxQR8l.KtlyhfyYLFhJXQprSCGcJL6o4N96y2JbiOi.q0ccw6bSBEwYR0ueswkRkotRwBLS ZyLsS1U7is_rxNLw9T9jxNdX9p0NaRSpK.9Osqt1.8fnLE7MTuACF0gqw60PdOJLR41aVITlTbeR pen_1MEXGXNqXgJ9Mwh3buAueJItun9gzoz4REbsWlKqoAusUEGvLnjGd5h5WGMDQPrRsfMOIcPP qARx9.akDFK0S2P1w1WwfYEKfRO8GKmCtGdqU_3YW7ns21YfNKB3p2g.8MqJDlwji4gnuK3gejkI SlRg_FBwF_8sSoo1P9gUR7FoG_J0xBq2hQYq8ip1u1ak.cIBQqSFNN9f1X0Pmi2sCozi1XnS5dYY 0QyGOqor54zNeKUXvhT1xTmqUoJdWACUbUwCUlF9gwnBfJ1V3jfQMBjSBHt8FK849ky4woq6OLax D4EeOkm8pQNfwCfcDUNAXg3Xg4w58z3z2Nv_M9GQNe0mtZHo_yySH6Lu_kDwbtH3XocDFPERaBXV 0D1Q2tcsh5WcCUJgF64.G7dZ13nmf0A5Y.5.mCNr5DEQBgxGcsYErU0YuAXirJGbjeE29R7wlnut jF82GOdIcOrjvcx3wBJc3jyz91ZUPTvZ9sH75NEsPQKBUUEFRRpRU2D.fEYE7LvlM4kJqE9i4Aks Nw.6nMpmO8oH22wqVxWQ9fOD5VMglLcg1lclnsjowD125xmlbtT.SDsuaa9Szb6iuesKKxqI2xmd ZfuuueQX3Ez3tlN9X5zU7fJjC9YHDF1TJ3SJXDA0Fzjy05a8vP5pXrqc_Ps_XVqoeCTViuxNw2sG ggbpb8Zm48Vqg42aogrJhr6s1rlIRe3lSmCQLHa8eRW0WE0y1OF10tI6oBKcOJhpHslJjsp2w4UK urNZ7mqbEgUkUndRtmuEGbGNMy3CpbMGc9mWRbvmEJjRiv1eqb4jhb6P58QyXFygj6JYqT7Ptykk hVzv5ie8sBnbh4mUIzOMLMluVLll0qMWeC98lgNn1EB3fZFR5HUXJFQJ9W5gxk7ulZd0f9ZU3fHc vHUZgjk1p1iWcz5HQCc.sLLk1hrStQWFrehe6cgI0uUrrNd0qzDNc8uoW3yXGh.nSGihPZ_59P8D ZrYNqDc_jHTzlHKp43gmOLiFZQZUQRn8Mfdzli6zzUUbjWBeEfm6q7a6X9Bjtf5YSCTpkqlimaWb zgb77Jm0QNG.aZPfxqIBOlbVIH7QUw9MV5dWbyMrlN3U328ZLrAEopdFd5RE8_mPGeU7TkfUaIY5 TpbwtgBABKeoUrTotaMXC0dKh3K3F140i8u6HKfQ_cLZf2KnhR9xI3c_Djgob0SEy60zdvmG1DgG nWEdZxS_rehJEUG0.YUMe4gwqHa_fE5idhSDP9KxnBcxuX9zcl8TcZ3uf7ird4Gosuwe6Z_Tfzx3 rhJZZyYQ4tD2rKV0ixavAdjRXUKuCHdu9ZNHoJdEUxrmvGEHfEqTmUWGbtMSfSnHtY66hOronRPa Qh3HGYeVt8NXArQ8S1aV4rNiq.SDIcGbvbJOhfq.LGn0XbcMjAV5iPu6fUWRU8xLgFftQe3lSl.1 .8boYXRm3P22Jp_F.grvQsNoea1J_wAzJNF4vjJGQis2.X5_UTS8eZRwqIxfONee.XvG6YnWmH0I 0MY5jmdSu.0DXTSC00Sg8Q3u99XvrbRmJqgSd2cYoe6aw8ZxqgbKeeT2WameKFuUp6hImNvaln4e m9.wlOaKkplJZS9Ju1qYzabStI33dh07zg1lDwn4i9W3gc4iqQT5GXmxreinriw36H68w5A8fE9T iS0T8gB1oCGGJbfSSiHxGfvxZDTPKfKm._AKIQozl78fNIxwXNqlpZQOMyiKcE8TjA7QliRx48w4 nBltgOOc- X-Sonic-MF: X-Sonic-ID: 1baa7b56-7719-49c5-b103-d876314db669 Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.gq1.yahoo.com with HTTP; Sat, 15 Apr 2023 20:26:12 +0000 Received: by hermes--production-bf1-5f9df5c5c4-fgkgh (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 240970ddb4e6aea88d95ffeaa746f9cd; Sat, 15 Apr 2023 20:26:08 +0000 (UTC) Content-Type: text/plain; charset=utf-8 List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 From: Mark Millard In-Reply-To: <20230415180720.AC396404@slippy.cwsent.com> Date: Sat, 15 Apr 2023 13:25:56 -0700 Cc: FreeBSD User , Charlie Li , Pawel Jakub Dawidek , Mateusz Guzik , dev-commits-src-main@freebsd.org, Current FreeBSD Content-Transfer-Encoding: quoted-printable Message-Id: <7963CD10-C44A-4C4D-B760-FA6E0A053FA9@yahoo.com> References: <20230413071032.18BFF31F@slippy.cwsent.com> <20230413063321.60344b1f@cschubert.com> <20230413135635.6B62F354@slippy.cwsent.com> <319a267e-3f76-3647-954a-02178c260cea@dawidek.net> <441db213-2abb-b37e-e5b3-481ed3e00f96@dawidek.net> <5ce72375-90db-6d30-9f3b-a741c320b1bf@freebsd.org> <99382FF7-765C-455F-A082-C47DB4D5E2C1@yahoo.com> <32cad878-726c-4562-0971-20d5049c28ad@freebsd.org> <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de> <20230415143625.99388387@slippy.cwsent.com> <5A47F62D-0E78-4C3E-84C0-45EEB03C7640@yahoo.com> <20230415180720.AC396404@slippy.cwsent.com> To: Cy Schubert X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Rspamd-Queue-Id: 4PzPwP4rXvz3wkF X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Apr 15, 2023, at 11:07, Cy Schubert = wrote: > In message <5A47F62D-0E78-4C3E-84C0-45EEB03C7640@yahoo.com>, Mark = Millard=20 > write > s: >> On Apr 15, 2023, at 07:36, Cy Schubert =3D >> wrote: >>=20 >>> In message = <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de>,=3D20=3D >>=20 >>> FreeBSD Us >>> er writes: >>>> Am Thu, 13 Apr 2023 22:18:04 -0700 >>>> Mark Millard schrieb: >>>> =3D20 >>>>> On Apr 13, 2023, at 21:44, Charlie Li wrote: >>>>> =3D20 >>>>>> Mark Millard wrote: =3D20 >>>>>>> FYI: in my original report for a context that has never had >>>>>>> block_cloning enabled, I reported BOTH missing files and >>>>>>> file content corruption in the poudriere-devel bulk build >>>>>>> testing. This predates: >>>>>>> https://people.freebsd.org/~pjd/patches/brt_revert.patch >>>>>>> but had the changes from: >>>>>>> https://github.com/openzfs/zfs/pull/14739/files >>>>>>> The files were missing from packages installed to be used >>>>>>> during a port's build. No other types of examples of missing >>>>>>> files happened. (But only 11 ports failed.) =3D20 >>>>>> I also don't have block_cloning enabled. "Missing files" prior to = =3D >> brt_rev >>>> ert may actually >>>>>> be present, but as the corruption also messes with the file(1) =3D >> signature, >>>> some tools like >>>>>> ldconfig report them as missing. =3D20 >>>>> =3D20 >>>>> For reference, the specific messages that were not explicit >>>>> null-byte complaints were (some shown with a little context): >>>>> =3D20 >>>>> =3D20 >>>>> =3D3D=3D3D=3D3D> py39-lxml-4.9.2 depends on shared library: = libxml2.so - =3D >> not found >>>>> =3D3D=3D3D=3D3D> Installing existing package =3D >> /packages/All/libxml2-2.10.3_1.pkg =3D20 >>>>> [CA72_ZFS] Installing libxml2-2.10.3_1... >>>>> [CA72_ZFS] Extracting libxml2-2.10.3_1: .......... done >>>>> =3D3D=3D3D=3D3D> py39-lxml-4.9.2 depends on shared library: = libxml2.so - =3D >> found >>>>> (/usr/local/lib/libxml2.so) . . . >>>>> [CA72_ZFS] Extracting libxslt-1.1.37: .......... done >>>>> =3D3D=3D3D=3D3D> py39-lxml-4.9.2 depends on shared library: = libxslt.so - =3D >> found >>>>> (/usr/local/lib/libxslt.so) =3D3D=3D3D=3D3D> Returning to build = of =3D >> py39-lxml-4.9.2 =3D20 >>>>> . . . >>>>> =3D3D=3D3D=3D3D> Configuring for py39-lxml-4.9.2 =3D20 >>>>> Building lxml version 4.9.2. >>>>> Building with Cython 0.29.33. >>>>> Error: Please make sure the libxml2 and libxslt development = packages =3D >> are in >>>> stalled. >>>>> =3D20 >>>>> =3D20 >>>>> [CA72_ZFS] Extracting libunistring-1.1: .......... done >>>>> =3D3D=3D3D=3D3D> libidn2-2.3.4 depends on shared library: =3D >> libunistring.so - not found >>>> =3D20 >>>>> =3D20 >>>>> =3D20 >>>>> [CA72_ZFS] Extracting gmp-6.2.1: .......... done >>>>> =3D3D=3D3D=3D3D> mpfr-4.2.0,1 depends on shared library: = libgmp.so - not =3D >> found =3D20 >>>>> =3D20 >>>>> =3D20 >>>>> =3D3D=3D3D=3D3D> nettle-3.8.1 depends on shared library: = libgmp.so - not =3D >> found >>>>> =3D3D=3D3D=3D3D> Installing existing package = /packages/All/gmp-6.2.1.pkg =3D >> =3D20 >>>>> [CA72_ZFS] Installing gmp-6.2.1... >>>>> the most recent version of gmp-6.2.1 is already installed >>>>> =3D3D=3D3D=3D3D> nettle-3.8.1 depends on shared library: = libgmp.so - not =3D >> found =3D20 >>>>> *** Error code 1 >>>>> =3D20 >>>>> =3D20 >>>>> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4 >>>>> =3D20 >>>>> =3D20 >>>>> checking for GNU=3D20 >>>>> M4 that supports accurate traces... configure: error: no = acceptable =3D >> m4 coul >>>> d be found in >>>>> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is =3D >> recommended. >>>>> GNU M4 1.4.15 uses a buggy replacement strstr on some systems. >>>>> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr = bug. >>>>> =3D20 >>>>> =3D20 >>>>> ld: error: /usr/local/lib/libblkid.a: unknown file type >>>>> =3D20 >>>>> =3D20 >>>>> =3D3D=3D3D=3D3D >>>>> Mark Millard >>>>> marklmi at yahoo.com >>>>> =3D20 >>>>> =3D20 >>>> =3D20 >>>> Hello=3D20 >>>> =3D20 >>>> whar is the recent status of fixing/mitigate this desatrous bug? =3D >> Especially f >>>> or those with the >>>> new option enabled on ZFS pools. Any advice? >>>> =3D20 >>>> In an act of precausion (or call it panic) I shutdown several = servers =3D >> to prev >>>> ent irreversible >>>> damages to databases and data storages. We face on one host with =3D >> /usr/ports r >>>> esiding on ZFS >>>> always errors on the same files created while staging (using =3D >> portmaster, leav >>>> es the system >>>> with noninstalled software, i.e. www/apache24 in our case). = Deleting =3D >> the work >>>> folder doesn't >>>> seem to change anything, even when starting a scrubbing of the = entire =3D >> pool (R >>>> AIDZ1 pool) - >>>> cause unknown, why it affects always the same files to be = corrupted. =3D >> Same wit >>>> h deve/ruby-gems. >>>> =3D20 >>>> Poudriere has been shutdown for the time being to avoid further =3D >> issues.=3D20 >>>> =3D20 >>>> Are there any advies to proceed apart from conserving the boxes via = =3D >> shutdown? >>>> =3D20 >>>> Thank you ;-) >>>> oh >>>> =3D20 >>>> =3D20 >>>> =3D20 >>>> --=3D20 >>>> O. Hartmann >>> =3D20 >>> With an up-to-date tree + pjd@'s "Fix data corruption when cloning =3D= >> embedded=3D20 >>> blocks. #14739" patch I didn't have any issues, except for email =3D >> messages=3D20 >>> with corruption in my sent directory, nowhere else. I'm still =3D >> investigating=3D20 >>> the email messages issue. IMO one is generally safe to run poudriere = =3D >> on the=3D20 >>> latest ZFS with the additional patch. >>=20 >> My poudriere testing failed when I tested such (14739 included), >> per what I reported, block_cloning never have been enabled. >> Others have also reported poudriere bulk build failures absent >> block_cloning being involved and 14739 being in place. My tests >> do predate: >>=20 >> https://people.freebsd.org/~pjd/patches/brt_revert.patch >=20 > IIRC this patch doesn't build. >=20 > My tree includes this patch. Pardon the cut&paste. This will not = apply. >=20 > diff --git a/sys/contrib/openzfs/module/zfs/dmu.c = b/sys/contrib/openzfs/modu > le/zfs/dmu.c985d833f58..cda1472a77aa 100644 > --- a/sys/contrib/openzfs/module/zfs/dmu.c > +++ b/sys/contrib/openzfs/module/zfs/dmu.c > @@ -2312,8 +2312,10 @@ dmu_brt_clone(objset_t *os, uint64_t object,=20 > uint64_t offset, uint64_t length, = dl->dr_overridden_by.blk_phys_birth =3D 0; > } else { > dl->dr_overridden_by.blk_birth =3D dr->dr_txg; > - dl->dr_overridden_by.blk_phys_birth =3D > - BP_PHYSICAL_BIRTH(bp); > + if (!BP_IS_EMBEDDED(bp)) { > + dl->dr_overridden_by.blk_phys_birth =3D > + BP_PHYSICAL_BIRTH(bp); > + } > } >=20 > mutex_exit(&db-=C2=B0=18=02=C2=93=1D>db_mtx); >=20 >>=20 >> and I'm not sure of if Cy's activity had brt_revert.patch in >> place or not. >=20 > I don't know if your poudriere has any residual file corruption or = not. I've only done the one test sequence that started from a context predating the import of the openzfs update. There was no prior corruption involved and no stage without the source fixes involved. > My=20 > poudriere working 100% ok and yours not indicates there may be = something=20 > amiss with your poudriere tree. The original media still predates the import and still is fine. The snapshot of the transfer state to the test media has no corruptions. > Remember I rolled back to the last nightly=20 > snapshot whereas you did not. You are confused: I had nothing to roll back as I started from pre-import and jumped in one step to a build that had the commits and 14739 with no intermediate steps. I then did the poudriere bulk for a jail that had no packages, one I use for temporary odd experiments that I clean out afterwards. Please quit attributing to my activities things that were not involved in my activites. It is likely confusing folks about what I'm reporting. I'll give a more detailed sequencing below, in case that helps. > I don't know the state of your poudriere=20 > tree. I know with 100% certainty that my tree is good. Remember my sequence for that (as reported before, but with a different presentation in case that helps): A) Start with root on ZFS media that predates the openzfs import. Note: This is a bectl context that looks like: # bectl list BE Active Mountpoint Space Created main-CA72 NR / 4.02G 2023-03-15 21:29 old-main-CA72 - - 1.82G 2023-03-01 17:25 (Lots of stuff is outside the BE's, so common to both.) It shows: # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 = main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 B) Establish a copy of the normal media on a 2nd media. C) Use the 2nd media for all the following steps, leanving my normal environment alone. D) Update git and /usr/main-src/ worktree to: main-n262122-2ef2c26f3f13-dirty E) Also apply 14739. F) buildworld buildkernel G) Do a sequence that in overall effect deletes old-main-CA72 and renames main-CA72 to old-main-CA72 and has the installkernel installworld results in a (new) main-CA72. (This is a normal update result for me.) H) Boot the new main-CA72. It showed: # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 = main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 = root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086 (Taken from the report I made at the time.) I) Next I did the poudriere bulk run, for a jail with no packages present, even pkg needing to be built. That poudriere bulk run had the 11 failures, 252 successes and 213 skipped. It was based on USE_TMPF=3Ddata instead of, say, USE_TMPFS=3Dall . This was in order to be sure there was more file system activity. (I'd not be surprised of USE_TMPFS=3Dall would have finished fine because of having far less zfs I/O involved.) An odd aspect that may be relevant: I do bulk builds in a manor that at times can have 100+ for one or more of the 3 load averages --on a system with just 16 cores (16 hardware threads). (I do not have specific maximum observed load average figures for the specific bulk run. But I could set up a retest and get such if desired. I use a modified top to get the "MaxObs" figures.) At no point in that sequence is a rollback relevant! Please quit assuming that I had ever built, installed, or booted something from a middle time frame compared to what I've explicitly reported. I did not, as I've reported multiple times. But you seem to forget or not believe such each time. >>=20 >> Other's notes include Mateusz Guzik's: >>=20 >> =3D >> = https://lists.freebsd.org/archives/dev-commits-src-main/2023-April/014534.= =3D >> html >=20 > My tree included this patch + pjd@'s last patch on people.freebsd.org. >>=20 >> which said: >>=20 >> QUOTE >> There is corruption with the recent import, with the >> https://github.com/openzfs/zfs/pull/14739/files patch applied and >> block cloning disabled on the pool. >=20 > I had zero poudriere corruption with this patch. But I did. > My only corruption was in=20 > my sent-items in my MH mail directory, which I think was due to email=20= > threads already containing nulls. >=20 >>=20 >> There is no corruption with top of main with zfs merge reverted =3D >> altogether. >>=20 >> Which commit results in said corruption remains to be seen, a variant >> of the tree with just block cloning support reverted just for testing >> purposes is about to be evaluated. >> END QUOTE >>=20 >> Charlie Li's later related notes that helps interpret that were in: >>=20 >> =3D >> = https://lists.freebsd.org/archives/dev-commits-src-main/2023-April/014545.= =3D >> html >>=20 >> QUOTE >> Testing with mjg@ earlier today revealed that block_cloning was not = the=3D20=3D >>=20 >> cause of poudriere bulk build (and similar cp(1)/install(1)-based)=3D20= >> corruption, although may have exacerbated it. >> END QUOTE >>=20 >> Mateusz later indicated had a hope to have is sorted out sometime >> Friday for what the cause(s) were: >>=20 >> =3D >> = https://lists.freebsd.org/archives/dev-commits-src-main/2023-April/014551.= =3D >> html >>=20 >> QUOTE >> I'm going to narrow down the non-blockcopy corruption after my = testjig >> gets off the ground. >>=20 >> Basically I expect to have it sorted out on Friday. >> END QUOTE >>=20 >> But the lack of later related messages suggests that did not happen. >>=20 >>> My tests of the additional patch >>=20 >> (I'm guessing that is a reference to 14739, not to brt_revert.patch = .) >>=20 >>> concluded that it resolved my last=3D20 >>> problems, except for the sent email problem I'm still investigating. = =3D >> I'm=3D20 >>> sure there's a simple explanation for it, i.e. the email thread = was=3D20 >>> corrupted by the EXDEV regression which cannot be fixed by anything, = =3D >> even=3D20 >>> reverting to the previous ZFS -- the data in those files will = remain=3D20=3D >>=20 >>> damaged regardless. >>=20 >> Again: my test jump from prior to the import to after the EXDEV >> changes, including having 14739. I still had poudriere bulk >> produce file corruptions. >>=20 >>> I cannot speak to the others who have had poudriere and other = issues. =3D >> I=3D20 >>> never had any problems with poudriere on top of the new ZFS. >>=20 >> Part of the mess is the variability. As I remember, I had 252 >> ports build fine in my test before the 11th failure meant that >> the rest (213) had all been classified as skipped. >>=20 >> It is not like most of the port builds failed: relatively uncommon. >>=20 >> Also, one port built on a retry, indicating random/racy behavior >> is involved. (The original failure was not from a file from >> installing build dependencies but something that the builder >> generated during the build. The 2nd try did not fail there or >> anywhere.) >>=20 >>> WRT reverting block_cloning pools to without, your only option is to = =3D >> backup=3D20 >>> your pool and recreate it without block_cloning. Then restore your =3D= >> data. >>> =3D20 >>=20 >> Given what has been reported by multiple people and >> Cy's own example of unexplained corruptions in email >> handling, I'd be cautious risking important data >> until reports from testing environment activity >> consistently report not having corruptions. >=20 > The "unexplained" email corruptions occurred in only the threads that=20= > already had corruption. Good to know. > I haven't been able to reproduce it anywhere else.=20 > I will continue testing on Monday. I expect my testing to confirm this=20= > hypothesis. Your evidence can not cancel what I and others have reported. It is just more evidence of observed variability (that is, as, yet, unexplained as far as I know). >> Another thing my activity does not include any testing >> of the suggestion in: >>=20 >> =3D >> = https://lists.freebsd.org/archives/dev-commits-src-main/2023-April/014607.= =3D >> html >>=20 >> to use "-o sync=3D3Ddisabled" in a clone, reporting: >=20 > This is a different issue. We need a core dump to resolve this. I'll = test=20 > this on my sandbox on Monday. >=20 > We can now reproduce this panic by hand. My sequence has appearently avoided what leads to panics so my only contribution to evidence about them is the lack of getting any panics for my sequence. > If there is no panic a diff -qr=20 > will confirm/deny this bug. Up to possible lack of reproducible builds for the activity done on the clone? Some files might be expected to be different if I understand the sequence that was being suggested. >>=20 >> QUOTE >> With this workaround I was able to build thousands of packages = without=3D20=3D >>=20 >> panics or failures due to data corruption. >> END QUOTE >>=20 >> If reliable, that consequence to the change might help >> folks that are trying to isolate the problem(s) figure >> out what is involved. >>=20 >> =3D3D=3D3D=3D3D >> Mark Millard >> marklmi at yahoo.com >=20 > IMO we've had a lack of systematic testing of the various bugs. What about my sequence would be an example of lack of being systematic? > The fact=20 > that this has caused some corrupt files has led to human panic over = the=20 > issue. I've no personal panic: the only context with the corruptions for me was a specially created context to test to see if I'd get corruption based on what was commited at the time (+1 separate patch). My standard/normal environment still predates the import of the openzfs update and has no problems. > Now that I've reverted my laptop to the old ZFS, the MH sent-email = issue=20 > continues to exhibit itself. This is because the files I forward to = myself=20 > already contain corrupt data. The old ZFS will not magically remove = this.=20 > This testing is necessary to prove my hypothesis. I expect brand new = email=20 > threads not to exhibit this problem with the new ZFS. =3D=3D=3D Mark Millard marklmi at yahoo.com