From nobody Thu Apr 13 13:33:21 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Py0s22rt3z451y8; Thu, 13 Apr 2023 13:33:26 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta002.cacentral1.a.cloudfilter.net (omta002.cacentral1.a.cloudfilter.net [3.97.99.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Py0s13hRtz3F7R; Thu, 13 Apr 2023 13:33:25 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; none Received: from shw-obgw-4002a.ext.cloudfilter.net ([10.228.9.250]) by cmsmtp with ESMTP id mc7ip87BQjvm1mx4mpAVdf; Thu, 13 Apr 2023 13:33:24 +0000 Received: from spqr.komquats.com ([70.66.148.124]) by cmsmtp with ESMTPA id mx4kplAu6yAOemx4kpcpIf; Thu, 13 Apr 2023 13:33:24 +0000 X-Authority-Analysis: v=2.4 cv=e5oV9Il/ c=1 sm=1 tr=0 ts=643804a4 a=Cwc3rblV8FOMdVN/wOAqyQ==:117 a=Cwc3rblV8FOMdVN/wOAqyQ==:17 a=xqWC_Br6kY4A:10 a=IkcTkHD0fZMA:10 a=dKHAf1wccvYA:10 a=13WrDtVnAAAA:8 a=YxBL1-UpAAAA:8 a=VxmjJ2MpAAAA:8 a=CjxXgO3LAAAA:8 a=kDZLfgLDAAAA:8 a=NEAV23lmAAAA:8 a=6I5d2MoRAAAA:8 a=EkcXrb_YAAAA:8 a=ihgffkLAiP3tzcNCCncA:9 a=QEXdDO2ut3YA:10 a=tCI1PRuhg74A:10 a=LyydU4Oes_UA:10 a=qcMfyop8IQhGkljw9-nY:22 a=Ia-lj3WSrqcvXOmTRaiG:22 a=7gXAzLPJhVmCkEl4_tsf:22 a=Aez1uqWRNYMWVBb44gMB:22 a=IjZwj45LgO3ly-622nXo:22 a=LK5xJRSDVpKd5WXXoEvA:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id B5F5CEBF; Thu, 13 Apr 2023 06:33:21 -0700 (PDT) Received: from localhost (localhost [IPv6:::1]) by slippy.cwsent.com (Postfix) with ESMTP id 873DE42A; Thu, 13 Apr 2023 06:33:21 -0700 (PDT) Date: Thu, 13 Apr 2023 06:33:21 -0700 From: Cy Schubert To: =?ISO-8859-1?Q?Pawe=3F?= Jakub Dawidek Cc: Mark Millard , Mateusz Guzik , vishwin@freebsd.org, dev-commits-src-main@freebsd.org, Current FreeBSD , pjd@freebsd.org Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 Message-ID: <20230413063321.60344b1f@cschubert.com> In-Reply-To: References: <20230413071032.18BFF31F@slippy.cwsent.com> Organization: KOMQUATS X-Mailer: Claws Mail 3.19.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfG/NhU8usWTT6E/Oy5taytMLmD+KXjVD0TFXLPq3b+df0Y/7ACTlNJousOPmmk/XbR+u/jy0TF+uuK6FGlBTbNvrmFGzZYkkuxQGj3RpNvDXfFUyiyds 8RArMpP81K8AlkIsKUtNuXBTNOeCbHqNISKxvawcIqYXyNu05H8r3BtB0NPuTxKCNthzRMPnNu3lL+PeGqROckXnoL4MeKAq2c8W0RVEfaZWAH/BT70WLChg 9ojhyZa6KqCd1cmqrLT3FQr1H5trDaJxbN9KKjJCUcOfvk/pcChU+cqkvcRq/1PGhnPF//Sp0uNvvesM8HLjjHIlxxhv52CXE74WmWGIgoG046x8orIk9g8S lkmqDqdTRavV0cdk+CUzfMVAkhP/IITd7i4aHz6bh/tqsZ6+xts= X-Rspamd-Queue-Id: 4Py0s13hRtz3F7R X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Thu, 13 Apr 2023 19:54:42 +0900 Pawe=C5=82 Jakub Dawidek wrote: > On Apr 13, 2023, at 16:10, Cy Schubert wrote: > >=20 > > =EF=BB=BFIn message <20230413070426.8A54F25A@slippy.cwsent.com>, Cy Sch= ubert writes: > > In message <20230413064252.1E5C1318@slippy.cwsent.com>, Cy Schubert wri= tes: > >> In message , Mark Mill= ard > >>> write > >>> s: > >>> [This just puts my prior reply's material into Cy's > >>>> adjusted resend of the original. The To/Cc should > >>>> be coomplete this time.] > >>>>=20 > >>>> On Apr 12, 2023, at 22:52, Cy Schubert = =3D > >>>> wrote: > >>>>=20 > >>>> In message , Mark =3D > >>>>> Millard=3D20 > >>>> write > >>>>> s: > >>>>> From: Charlie Li wrote on > >>>>>> Date: Wed, 12 Apr 2023 20:11:16 UTC : > >>>>>> =3D20 > >>>>>> Charlie Li wrote: > >>>>>>> Mateusz Guzik wrote: > >>>>>>>> can you please test poudriere with > >>>>>>>>> https://github.com/openzfs/zfs/pull/14739/files > >>>>>>>>> =3D20 > >>>>>>>>> After applying, on the md(4)-backed pool regardless of =3D3D > >>>>>>>> block_cloning,=3D3D20 > >>>>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. =3D > >>>>>>>> Will=3D3D20=3D3D > >>>> =3D20 > >>>>>> report back on poudriere results (no block_cloning). > >>>>>>>> =3D3D20 > >>>>>>>> As for poudriere, build failures are still rolling in. These are= =3D > >>>>>>> (and=3D3D20=3D3D > >>>> =3D20 > >>>>>> have been) entirely random on every run. Some examples from this = =3D > >>>>>>> run: > >>>> =3D3D20 > >>>>>>> lang/php81: > >>>>>>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D= 3D20 > >>>>>>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D3D > >>>>>>> ${STAGEDIR}/${PREFIX}/etc > >>>>>> - consumers fail to build due to corrupted php.conf packaged > >>>>>>> =3D3D20 > >>>>>>> devel/ninja: > >>>>>>> - phase: stage > >>>>>>> - install -s -m 555=3D3D20 > >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D3D20 > >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin > >>>>>>> - consumers fail to build due to corrupted bin/ninja packaged > >>>>>>> =3D3D20 > >>>>>>> devel/netsurf-buildsystem: > >>>>>>> - phase: stage > >>>>>>> - mkdir -p=3D3D20 > >>>>>>> =3D3D > >>>>>>> =3D > >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/= share/n > >>>> e=3D > >> =3D3D > >>>> tsurf-buildsystem/makefiles=3D3D20 > >>>>>> =3D3D > >>>>>>> =3D > >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/= share/n > >>>> e=3D > >> =3D3D > >>>> tsurf-buildsystem/testtools > >>>>>> for M in Makefile.top Makefile.tools Makefile.subdir =3D3D > >>>>>>> Makefile.pkgconfig=3D3D20 > >>>>>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \ > >>>>>>> cp makefiles/$M=3D3D20 > >>>>>>> =3D3D > >>>>>>> =3D > >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/= share/n > >>>> e=3D > >> =3D3D > >>>> tsurf-buildsystem/makefiles/;=3D3D20 > >>>>>> \ > >>>>>>> done > >>>>>>> - graphics/libnsgif fails to build due to NUL characters in=3D3D20 > >>>>>>> Makefile.{clang,subdir}, causing nothing to link > >>>>>>> =3D20 > >>>>>> Summary: I have problems building ports into packages > >>>>>> via poudriere-devel use despite being fully updated/patched > >>>>>> (as of when I started the experiment), never having enabled > >>>>>> block_cloning ( still using openzfs-2.1-freebsd ). > >>>>>> =3D20 > >>>>>> In other words, I can confirm other reports that have > >>>>>> been made. > >>>>>> =3D20 > >>>>>> The details follow. > >>>>>> =3D20 > >>>>>> =3D20 > >>>>>> [Written as I was working on setting up for the experiments > >>>>>> and then executing those experiments, adjusting as I went > >>>>>> along.] > >>>>>> =3D20 > >>>>>> I've run my own tests in a context that has never had the > >>>>>> zpool upgrade and that jump from before the openzfs import to > >>>>>> after the existing commits for trying to fix openzfs on > >>>>>> FreeBSD. I report on the sequence of activities getting to > >>>>>> the point of testing as well. > >>>>>> =3D20 > >>>>>> By personal policy I keep my (non-temporary) pool's compatible > >>>>>> with what the most recent ??.?-RELEASE supports, using > >>>>>> openzfs-2.1-freebsd for now. The pools involved below have > >>>>>> never had a zpool upgrade from where they started. (I've no > >>>>>> pools that have ever had a zpool upgrade.) > >>>>>> =3D20 > >>>>>> (Temporary pools are rare for me, such as this investigation. > >>>>>> But I'm not testing block_cloning or anything new this time.) > >>>>>> =3D20 > >>>>>> I'll note that I use zfs for bectl, not for redundancy. So > >>>>>> my evidence is more limited in that respect. > >>>>>> =3D20 > >>>>>> The activities were done on a HoneyComb (16 Cortex-A72 cores). > >>>>>> The system has and supports ECC RAM, 64 GiBytes of RAM are > >>>>>> present. > >>>>>> =3D20 > >>>>>> I started by duplicating my normal zfs environment to an > >>>>>> external USB3 NVMe drive and adjusting the host name and such > >>>>>> to produce the below. (Non-debug, although I do not strip > >>>>>> symbols.) : > >>>>>> =3D20 > >>>>>> # uname -apKU > >>>>>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =3D3D > >>>>>> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =C2= =A0=C2=A0=C2=A0=C2=A0=3D3D > >>>>>> =3D > >>>>>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-= src/arm > >>>> 6=3D > >> =3D3D > >>>> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 > >>>>>> =3D20 > >>>>>> I then did: git fetch, stash push ., merge --ff-only, stash apply = . : > >>>>>> my normal procedure. I then also applied the patch from: > >>>>>> =3D20 > >>>>>> https://github.com/openzfs/zfs/pull/14739/files > >>>>>> =3D20 > >>>>>> Then I did: buildworld buildkernel, install them, and rebooted. > >>>>>> =3D20 > >>>>>> The result was: > >>>>>> =3D20 > >>>>>> # uname -apKU > >>>>>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =3D3D > >>>>>> main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 =C2= =A0=C2=A0=C2=A0=C2=A0=3D3D > >>>>>> =3D > >>>>>> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-= src/arm > >>>> 6=3D > >> =3D3D > >>>> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086 > >>>>>> =3D20 > >>>>>> The later poudriere-devel based build of packages from ports is > >>>>>> based on: > >>>>>> =3D20 > >>>>>> # ~/fbsd-based-on-what-commit.sh -C /usr/ports > >>>>>> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =3D3D > >>>>>> devel/freebsd-gcc12: Bump to 12.2.0. > >>>>>> Author: =C2=A0=C2=A0=C2=A0=C2=A0John Baldwin > >>>>>> Commit: =C2=A0=C2=A0=C2=A0=C2=A0John Baldwin > >>>>>> CommitDate: 2023-03-25 00:06:40 +0000 > >>>>>> branch: main > >>>>>> merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72 > >>>>>> merge-base: CommitDate: 2023-03-25 00:06:40 +0000 > >>>>>> n613214 (--first-parent --count for merge-base) > >>>>>> =3D20 > >>>>>> poudriere attempted to build 476 packages, starting > >>>>>> with pkg (in order to build the 56 that I explicitly > >>>>>> indicate that I want). It is my normal set of ports. > >>>>>> The form of building is biased to allowing a high > >>>>>> load average compared to the number of hardware > >>>>>> threads (same as cores here): each builder is allowed > >>>>>> to use the full count of hardware threads. The build > >>>>>> =E2=82=AC=C3=8FL=E2=82=AC=E2=82=AC=E2=82=AC=E2=82=AC=E2=80=B9=15 >= > >> used USE_TMPFS=3D3D3D"data" instead of the USE_TMPFS=3D3D3Dall I > >> normally use on the build machine involved. > >>>>>> =3D20 > >>>>>> And it produced some random errors during the attempted > >>>>>> builds. A type of example that is easy to interpret > >>>>>> without further exploration is: > >>>>>> =3D20 > >>>>>> pkg_resources.extern.packaging.requirements.InvalidRequirement: Pa= rse > >>>>>> =3D > >> =3D3D > >>>> error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Za-z) > >>>>>> =C2=A0=C2=A0=C2=A0=C2=A00 > >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0da0p8 =C2=A0=C2=A0=C2= =A0=C2=A0ONLINE =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A00 =C2=A0=C2=A0=C2=A0=C2= =A00 =C2=A0=C2=A0=C2=A0=C2=A00 > >>>>>> =3D20 > >>>>>> errors: No known data errors > >>>>>> =3D20 > >>>>>> =3D20 > >>>>>> =3D3D3D=3D3D3D=3D3D3D > >>>>>> Mark Millard > >>>>>> marklmi at yahoo.com > >>>>>> =3D20 > >>>>> =3D20 > >>>>> Let's try this again. Claws-mail didn't include the list address in= =3D > >>>>> the=3D20 > >>>> header. Trying to reply, again, using exmh instead. > >>>>> =3D20 > >>>>> =3D20 > >>>>> Did your pools suffer the EXDEV problem? The EXDEV also corrupted = =3D > >>>>> files. > >>>>=20 > >>>> As I reported, this was a jump from before the import > >>>> to as things are tonight (here). So: NO, unless the > >>>> existing code as of tonight still has the EXDEV problem! > >>>>=20 > >>>> Prior to this experiment I'd not progressed any media > >>>> beyond: main-n261544-cee09bda03c8-dirty Wed Mar 15 20:25:49. > >>>>=20 > >>>> I think, without sufficient investigation we risk jumping to > >>>>> conclusions. I've taken an extremely cautious approach, rolling back > >>>>> snapshots (as much as possible, i.e. poudriere datasets) when EXDEV > >>>>> corruption was encountered. > >>>>>=20 > >>>> Again: nothing between main-n261544-cee09bda03c8-dirty and > >>>> main-n262122-2ef2c26f3f13-dirty was involved at any stage. > >>>>=20 > >>>> =3D20 > >>>>> I did not rollback any snapshots in my MH mail directory. Rolling b= ack > >>>>> snapshots of my MH maildir would result in loss of email. I have to > >>>>> live with that corruption. Corrupted files in my outgoing sent email > >>>>> directory remain: > >>>>> =3D20 > >>>>> slippy$ ugrep -cPa '\x00' ~/.Mail/note | grep -c :1=3D20 > >>>>> 53 > >>>>> slippy$=3D20 > >>>>> =3D20 > >>>>> There are 53 corrupted files in my note log of 9913 emails. Those = =3D > >>>>> files > >>>> will never be fixed. They were corrupted by the EXDEV bug. Any new Z= FS > >>>>> or ZFS patches cannot retroactively remove the corruption from those > >>>>> files. > >>>>> =3D20 > >>>>> But my poudriere files, because the snapshots were rolled back, were > >>>>> "repaired" by the rolled back snapshots. > >>>>> =3D20 > >>>>> I'm not convinced that there is presently active corruption since > >>>>> the problem has been fixed. I am convinced that whatever corruption > >>>>> that was written at the time will remain forever or until those fil= es > >>>>> are deleted or replaced -- just like my email files written to disk= at > >>>>> the time. > >>>>>=20 > >>>> My test results and procedure just do not fit your conclusion > >>>> that things are okay now if block_clonging is completely avoided. > >>>>=20 > >>> Admitting I'm wrong: sending copies of my last reply to you back to m= yself, > >>>=20 > >> again and again, three times, I've managed to reproduce the corruption= you > >>> are talking about. > >>>=20 > >> This email itself was also corrupted. Below is what was sent. Good thi= ng > >> multiple copies are saved by exmh. > >>=20 > >> Admitting I'm wrong: sending copies of my last reply to you back to my= self, > >> again and again, three times, I've managed to reproduce the corruption= you > >> are talking about. > >>=20 > > This email itself was also corrupted. Below is what was sent. Good thing > > multiple copies are saved by exmh. > >=20 > > Admitting I'm wrong: sending copies of my last reply to you back to mys= elf, > > again and again, three times, I've managed to reproduce the corruption = you > > are talking about. > >=20 > > From my previous email to you. > >=20 > > header. Trying to reply:::::::::, again, using exmh instead. > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0^^^^^^^^^ > > Here it is, nine additional bytes of garbage. I've replaced the garbage > > with colons because nulls mess up a lot of things, including cut&paste. > >=20 > > In another instance about 500 bytes were removed. I can reproduce the > > corruption at will now. > >=20 > > The EXDEV patch is applied. Block_cloning is disabled. > >=20 > > Somehow nulls and other garbage are inserted in the middle of emails af= ter > > the ZFS upgrade. > >=20 > Can you please try this patch: >=20 > github.com The patch was applied yesterday at noon (PDT). >=20 >=20 >=20 > Unfortunately I don=E2=80=99t see how this can happen with block cloning = disabled. It does and it's reproducible. >=20 > --=C2=A0 > Pawe=C5=82 Jakub Dawidek >=20 --=20 Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=3D0