From nobody Thu Apr 13 13:48:26 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Py1BN3wZnz453CJ; Thu, 13 Apr 2023 13:48:28 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com [IPv6:2607:f8b0:4864:20::32c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Py1BN1xkRz3v3C; Thu, 13 Apr 2023 13:48:28 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ot1-x32c.google.com with SMTP id x22-20020a9d6296000000b006a42c37ddcdso1844306otk.1; Thu, 13 Apr 2023 06:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681393707; x=1683985707; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=E66lyzydUgVBgSAemf9jaZxb1uimf/I7PLpXzx95GKE=; b=Lwjac2ujwXnpF/DHxtPnuatf8+CaWAtpth7cLx4XMNgYBY1ZyL2nG8INKMQNXpT+TU DKFFCTCDafliFVPpVDxynMarSY7SebRD+THnEdSWuLxmtV0RaEvJQ4zi3sq+wQ5qSSza CBBJC7C/o+HKGSOqGO8UNY1MZKo0VADcWfIlIDVXnXZluKI2H18XD8Tr85xeY58UwoWQ AvXpPakjWFezn6YzII0agLAiJ+euVeZmxRskLmk9GIBxtG41JSuCarBjw0pX9ilTPESO yoEI4t04AGJgcgVC9MHwy+7wjLMNCwaa+MnxDvzL/PoEDZpNxQHGnT8g6gcKIT1qhPuU 8Q+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681393707; x=1683985707; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E66lyzydUgVBgSAemf9jaZxb1uimf/I7PLpXzx95GKE=; b=HWLT1DxUc8YzZBEtbf8DCMRTEAvcFVvj+kBxirC5VzL79jiNwfg55jff97SlsNKfwY m35uajyJY4hn2p6MrdE/GtNrstJngCbABRk6kHWpNdXRp7RUDXqd29NVsvTUFwF+TBxj PamGUmLVZrUsXute4FoFi7Lt+Anj0DTwC4WXgrDHId+3WkzF4mOz2emU1Jswa/UqMw6Y GqQxv4aV1sEc8W9y+oRy5nlOfxSd0+frgbeghSRD25mNne4puOgBpiKXet6ASSjNWOye w3Ah/xY+lP6/0C3e75rHSofZXnhxWieQRNREo6ZN3Y76DiPBldy8MTs7Y8XQpCcuFgJS sk5A== X-Gm-Message-State: AAQBX9dike+Xqwt1JKk/KjWPbkdrmC9XpS80A830LsCsZ5JlgUcY9CaG bilKerNObdocdIsiU/skv9rCo8bhCCIf0CieFpw= X-Google-Smtp-Source: AKy350ZQeJf0wGOecyyrl9c4U7h1uTIBLhiqZYX3nqqCrAS0G5TrKGiELi8fTkYKmtBt7RsZrws8TIB/claih0r73Z0= X-Received: by 2002:a05:6830:2093:b0:6a1:cbc6:f1b3 with SMTP id y19-20020a056830209300b006a1cbc6f1b3mr616405otq.2.1681393707011; Thu, 13 Apr 2023 06:48:27 -0700 (PDT) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Received: by 2002:ac9:74cf:0:b0:49c:b071:b1e3 with HTTP; Thu, 13 Apr 2023 06:48:26 -0700 (PDT) In-Reply-To: <20230413063321.60344b1f@cschubert.com> References: <20230413071032.18BFF31F@slippy.cwsent.com> <20230413063321.60344b1f@cschubert.com> From: Mateusz Guzik Date: Thu, 13 Apr 2023 15:48:26 +0200 Message-ID: Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 To: Cy Schubert Cc: "Pawe? Jakub Dawidek" , Mark Millard , vishwin@freebsd.org, dev-commits-src-main@freebsd.org, Current FreeBSD , pjd@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4Py1BN1xkRz3v3C X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On 4/13/23, Cy Schubert wrote: > On Thu, 13 Apr 2023 19:54:42 +0900 > Pawe=C5=82 Jakub Dawidek wrote: > >> On Apr 13, 2023, at 16:10, Cy Schubert wrote= : >> > >> > =EF=BB=BFIn message <20230413070426.8A54F25A@slippy.cwsent.com>, Cy Sc= hubert >> > writes: >> > In message <20230413064252.1E5C1318@slippy.cwsent.com>, Cy Schubert >> > writes: >> >> In message , Mark >> >> Millard >> >>> write >> >>> s: >> >>> [This just puts my prior reply's material into Cy's >> >>>> adjusted resend of the original. The To/Cc should >> >>>> be coomplete this time.] >> >>>> >> >>>> On Apr 12, 2023, at 22:52, Cy Schubert = =3D >> >>>> wrote: >> >>>> >> >>>> In message , Mark = =3D >> >>>>> Millard=3D20 >> >>>> write >> >>>>> s: >> >>>>> From: Charlie Li wrote on >> >>>>>> Date: Wed, 12 Apr 2023 20:11:16 UTC : >> >>>>>> =3D20 >> >>>>>> Charlie Li wrote: >> >>>>>>> Mateusz Guzik wrote: >> >>>>>>>> can you please test poudriere with >> >>>>>>>>> https://github.com/openzfs/zfs/pull/14739/files >> >>>>>>>>> =3D20 >> >>>>>>>>> After applying, on the md(4)-backed pool regardless of =3D3D >> >>>>>>>> block_cloning,=3D3D20 >> >>>>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. = =3D >> >>>>>>>> Will=3D3D20=3D3D >> >>>> =3D20 >> >>>>>> report back on poudriere results (no block_cloning). >> >>>>>>>> =3D3D20 >> >>>>>>>> As for poudriere, build failures are still rolling in. These ar= e >> >>>>>>>> =3D >> >>>>>>> (and=3D3D20=3D3D >> >>>> =3D20 >> >>>>>> have been) entirely random on every run. Some examples from this = =3D >> >>>>>>> run: >> >>>> =3D3D20 >> >>>>>>> lang/php81: >> >>>>>>> - post-install: @${INSTALL_DATA} >> >>>>>>> ${WRKSRC}/php.ini-development=3D3D20 >> >>>>>>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D3D >> >>>>>>> ${STAGEDIR}/${PREFIX}/etc >> >>>>>> - consumers fail to build due to corrupted php.conf packaged >> >>>>>>> =3D3D20 >> >>>>>>> devel/ninja: >> >>>>>>> - phase: stage >> >>>>>>> - install -s -m 555=3D3D20 >> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D3D20 >> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin >> >>>>>>> - consumers fail to build due to corrupted bin/ninja packaged >> >>>>>>> =3D3D20 >> >>>>>>> devel/netsurf-buildsystem: >> >>>>>>> - phase: stage >> >>>>>>> - mkdir -p=3D3D20 >> >>>>>>> =3D3D >> >>>>>>> =3D >> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local= /share/n >> >>>> e=3D >> >> =3D3D >> >>>> tsurf-buildsystem/makefiles=3D3D20 >> >>>>>> =3D3D >> >>>>>>> =3D >> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local= /share/n >> >>>> e=3D >> >> =3D3D >> >>>> tsurf-buildsystem/testtools >> >>>>>> for M in Makefile.top Makefile.tools Makefile.subdir =3D3D >> >>>>>>> Makefile.pkgconfig=3D3D20 >> >>>>>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do >> >>>>>> \ >> >>>>>>> cp makefiles/$M=3D3D20 >> >>>>>>> =3D3D >> >>>>>>> =3D >> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local= /share/n >> >>>> e=3D >> >> =3D3D >> >>>> tsurf-buildsystem/makefiles/;=3D3D20 >> >>>>>> \ >> >>>>>>> done >> >>>>>>> - graphics/libnsgif fails to build due to NUL characters in=3D3D= 20 >> >>>>>>> Makefile.{clang,subdir}, causing nothing to link >> >>>>>>> =3D20 >> >>>>>> Summary: I have problems building ports into packages >> >>>>>> via poudriere-devel use despite being fully updated/patched >> >>>>>> (as of when I started the experiment), never having enabled >> >>>>>> block_cloning ( still using openzfs-2.1-freebsd ). >> >>>>>> =3D20 >> >>>>>> In other words, I can confirm other reports that have >> >>>>>> been made. >> >>>>>> =3D20 >> >>>>>> The details follow. >> >>>>>> =3D20 >> >>>>>> =3D20 >> >>>>>> [Written as I was working on setting up for the experiments >> >>>>>> and then executing those experiments, adjusting as I went >> >>>>>> along.] >> >>>>>> =3D20 >> >>>>>> I've run my own tests in a context that has never had the >> >>>>>> zpool upgrade and that jump from before the openzfs import to >> >>>>>> after the existing commits for trying to fix openzfs on >> >>>>>> FreeBSD. I report on the sequence of activities getting to >> >>>>>> the point of testing as well. >> >>>>>> =3D20 >> >>>>>> By personal policy I keep my (non-temporary) pool's compatible >> >>>>>> with what the most recent ??.?-RELEASE supports, using >> >>>>>> openzfs-2.1-freebsd for now. The pools involved below have >> >>>>>> never had a zpool upgrade from where they started. (I've no >> >>>>>> pools that have ever had a zpool upgrade.) >> >>>>>> =3D20 >> >>>>>> (Temporary pools are rare for me, such as this investigation. >> >>>>>> But I'm not testing block_cloning or anything new this time.) >> >>>>>> =3D20 >> >>>>>> I'll note that I use zfs for bectl, not for redundancy. So >> >>>>>> my evidence is more limited in that respect. >> >>>>>> =3D20 >> >>>>>> The activities were done on a HoneyComb (16 Cortex-A72 cores). >> >>>>>> The system has and supports ECC RAM, 64 GiBytes of RAM are >> >>>>>> present. >> >>>>>> =3D20 >> >>>>>> I started by duplicating my normal zfs environment to an >> >>>>>> external USB3 NVMe drive and adjusting the host name and such >> >>>>>> to produce the below. (Non-debug, although I do not strip >> >>>>>> symbols.) : >> >>>>>> =3D20 >> >>>>>> # uname -apKU >> >>>>>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =3D3D >> >>>>>> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 >> >>>>>> =3D3D >> >>>>>> =3D >> >>>>>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main= -src/arm >> >>>> 6=3D >> >> =3D3D >> >>>> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 >> >>>>>> =3D20 >> >>>>>> I then did: git fetch, stash push ., merge --ff-only, stash apply= . >> >>>>>> : >> >>>>>> my normal procedure. I then also applied the patch from: >> >>>>>> =3D20 >> >>>>>> https://github.com/openzfs/zfs/pull/14739/files >> >>>>>> =3D20 >> >>>>>> Then I did: buildworld buildkernel, install them, and rebooted. >> >>>>>> =3D20 >> >>>>>> The result was: >> >>>>>> =3D20 >> >>>>>> # uname -apKU >> >>>>>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =3D3D >> >>>>>> main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 >> >>>>>> =3D3D >> >>>>>> =3D >> >>>>>> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main= -src/arm >> >>>> 6=3D >> >> =3D3D >> >>>> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086 >> >>>>>> =3D20 >> >>>>>> The later poudriere-devel based build of packages from ports is >> >>>>>> based on: >> >>>>>> =3D20 >> >>>>>> # ~/fbsd-based-on-what-commit.sh -C /usr/ports >> >>>>>> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =3D3D >> >>>>>> devel/freebsd-gcc12: Bump to 12.2.0. >> >>>>>> Author: John Baldwin >> >>>>>> Commit: John Baldwin >> >>>>>> CommitDate: 2023-03-25 00:06:40 +0000 >> >>>>>> branch: main >> >>>>>> merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72 >> >>>>>> merge-base: CommitDate: 2023-03-25 00:06:40 +0000 >> >>>>>> n613214 (--first-parent --count for merge-base) >> >>>>>> =3D20 >> >>>>>> poudriere attempted to build 476 packages, starting >> >>>>>> with pkg (in order to build the 56 that I explicitly >> >>>>>> indicate that I want). It is my normal set of ports. >> >>>>>> The form of building is biased to allowing a high >> >>>>>> load average compared to the number of hardware >> >>>>>> threads (same as cores here): each builder is allowed >> >>>>>> to use the full count of hardware threads. The build >> >>>>>> =E2=82=AC=C3=8FL=E2=82=AC=E2=82=AC=E2=82=AC=E2=82=AC=E2=80=B9 > = > >> used USE_TMPFS=3D3D3D"data" instead of the >> >>>>>> USE_TMPFS=3D3D3Dall I >> >> normally use on the build machine involved. >> >>>>>> =3D20 >> >>>>>> And it produced some random errors during the attempted >> >>>>>> builds. A type of example that is easy to interpret >> >>>>>> without further exploration is: >> >>>>>> =3D20 >> >>>>>> pkg_resources.extern.packaging.requirements.InvalidRequirement: >> >>>>>> Parse >> >>>>>> =3D >> >> =3D3D >> >>>> error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected >> >>>> W:(0-9A-Za-z) >> >>>>>> 0 >> >> da0p8 ONLINE 0 0 0 >> >>>>>> =3D20 >> >>>>>> errors: No known data errors >> >>>>>> =3D20 >> >>>>>> =3D20 >> >>>>>> =3D3D3D=3D3D3D=3D3D3D >> >>>>>> Mark Millard >> >>>>>> marklmi at yahoo.com >> >>>>>> =3D20 >> >>>>> =3D20 >> >>>>> Let's try this again. Claws-mail didn't include the list address i= n >> >>>>> =3D >> >>>>> the=3D20 >> >>>> header. Trying to reply, again, using exmh instead. >> >>>>> =3D20 >> >>>>> =3D20 >> >>>>> Did your pools suffer the EXDEV problem? The EXDEV also corrupted = =3D >> >>>>> files. >> >>>> >> >>>> As I reported, this was a jump from before the import >> >>>> to as things are tonight (here). So: NO, unless the >> >>>> existing code as of tonight still has the EXDEV problem! >> >>>> >> >>>> Prior to this experiment I'd not progressed any media >> >>>> beyond: main-n261544-cee09bda03c8-dirty Wed Mar 15 20:25:49. >> >>>> >> >>>> I think, without sufficient investigation we risk jumping to >> >>>>> conclusions. I've taken an extremely cautious approach, rolling >> >>>>> back >> >>>>> snapshots (as much as possible, i.e. poudriere datasets) when EXDE= V >> >>>>> corruption was encountered. >> >>>>> >> >>>> Again: nothing between main-n261544-cee09bda03c8-dirty and >> >>>> main-n262122-2ef2c26f3f13-dirty was involved at any stage. >> >>>> >> >>>> =3D20 >> >>>>> I did not rollback any snapshots in my MH mail directory. Rolling >> >>>>> back >> >>>>> snapshots of my MH maildir would result in loss of email. I have t= o >> >>>>> live with that corruption. Corrupted files in my outgoing sent >> >>>>> email >> >>>>> directory remain: >> >>>>> =3D20 >> >>>>> slippy$ ugrep -cPa '\x00' ~/.Mail/note | grep -c :1=3D20 >> >>>>> 53 >> >>>>> slippy$=3D20 >> >>>>> =3D20 >> >>>>> There are 53 corrupted files in my note log of 9913 emails. Those = =3D >> >>>>> files >> >>>> will never be fixed. They were corrupted by the EXDEV bug. Any new >> >>>> ZFS >> >>>>> or ZFS patches cannot retroactively remove the corruption from >> >>>>> those >> >>>>> files. >> >>>>> =3D20 >> >>>>> But my poudriere files, because the snapshots were rolled back, >> >>>>> were >> >>>>> "repaired" by the rolled back snapshots. >> >>>>> =3D20 >> >>>>> I'm not convinced that there is presently active corruption since >> >>>>> the problem has been fixed. I am convinced that whatever corruptio= n >> >>>>> that was written at the time will remain forever or until those >> >>>>> files >> >>>>> are deleted or replaced -- just like my email files written to dis= k >> >>>>> at >> >>>>> the time. >> >>>>> >> >>>> My test results and procedure just do not fit your conclusion >> >>>> that things are okay now if block_clonging is completely avoided. >> >>>> >> >>> Admitting I'm wrong: sending copies of my last reply to you back to >> >>> myself, >> >>> >> >> again and again, three times, I've managed to reproduce the corruptio= n >> >> you >> >>> are talking about. >> >>> >> >> This email itself was also corrupted. Below is what was sent. Good >> >> thing >> >> multiple copies are saved by exmh. >> >> >> >> Admitting I'm wrong: sending copies of my last reply to you back to >> >> myself, >> >> again and again, three times, I've managed to reproduce the corruptio= n >> >> you >> >> are talking about. >> >> >> > This email itself was also corrupted. Below is what was sent. Good >> > thing >> > multiple copies are saved by exmh. >> > >> > Admitting I'm wrong: sending copies of my last reply to you back to >> > myself, >> > again and again, three times, I've managed to reproduce the corruption >> > you >> > are talking about. >> > >> > From my previous email to you. >> > >> > header. Trying to reply:::::::::, again, using exmh instead. >> > ^^^^^^^^^ >> > Here it is, nine additional bytes of garbage. I've replaced the garbag= e >> > with colons because nulls mess up a lot of things, including cut&paste= . >> > >> > In another instance about 500 bytes were removed. I can reproduce the >> > corruption at will now. >> > >> > The EXDEV patch is applied. Block_cloning is disabled. >> > >> > Somehow nulls and other garbage are inserted in the middle of emails >> > after >> > the ZFS upgrade. >> > >> Can you please try this patch: >> >> github.com > > The patch was applied yesterday at noon (PDT). > >> >> >> >> Unfortunately I don=E2=80=99t see how this can happen with block cloning >> disabled. > > It does and it's reproducible. > There is corruption with the recent import, with the https://github.com/openzfs/zfs/pull/14739/files patch applied and block cloning disabled on the pool. There is no corruption with top of main with zfs merge reverted altogether. Which commit results in said corruption remains to be seen, a variant of the tree with just block cloning support reverted just for testing purposes is about to be evaluated. --=20 Mateusz Guzik