From nobody Thu Apr 13 06:21:19 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PxqGt0HcCz45Y9t for ; Thu, 13 Apr 2023 06:21:41 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic313-21.consmr.mail.gq1.yahoo.com (sonic313-21.consmr.mail.gq1.yahoo.com [98.137.65.84]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4PxqGs2GfRz4Wym for ; Thu, 13 Apr 2023 06:21:41 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681366898; bh=CO8PRRu3TSsLc/F8pOb+6WdYJ9PxhDtllc2LhGE4UeA=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=JRtzimld5TFU5SrXYWwpaqC+MrtBsY3JoIdRkjx+uNfyx2oC0bqKMfwh6OcnKFce/MKZK0iWaB60on7/EvcU/KkQROb4OKU3EyOAuY7KAwmDZRr2G0I5AraKds4VLGL6LC+z/Y3e2bsicpG//HRNs3Mkiu6nuAX/QewCD8YNgFiEilLcjiI1dgS7g9ecKI+LC5UvHoAc1O6tLtW6k2t0l0BR8+VCfrrYum7KGc0PoVRinjXBJ3vyAEQSWccIrGOllyXsKQjI4cgrl/0gIKgCeXeXMm7c3XE0fYP1tCmayi3Jo9xunO0uOSSwr8lbKYFOwx9Giea5hFKRynWPieYKIg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681366898; bh=C7Tg+f5q7NYiKWciU0/pEo1Db4MuQH72/miLV4eiM/M=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=C2tgmlLgE2UJ7RQSHpZ7/aMNjKyN46m2bPxwK8FIZ7fTQiiIjT3XmGWe//1dQisxm4vcLfKjV8QYDIAv4bqTVC+skfpFkpUtbqrRADaJ1ZDfz2HjfkdQioNNDpklFG+HFa96OdANyQiS8jk0KxSDpDlJRjr7L7jy+8sXZ2Hyx+f2+I9cezK3+5jAOyXS4rdvx6uThkzPjjbMfwVJ2OtkUlBYDaHDgdyVfzxjRBbGhKYPu7Y4TLvp6SzWgf7aUYvOkVpClcZoduydmB+SjsP/Lh+h7UEi6IEBSiuECc6L49O6s72DwAB0Y0B4SqEyPNzXniPqx8ekjcwFXWXJ9C1APg== X-YMail-OSG: RyNtswIVM1npF6zm.gZ4eIQsHfA_tU385IEK__S.6YUuBJs9bysP5mTh.MCIaCH MWoLKCgIDaUp64d6Xfo1YDvytOIo_Tof3X5vMDBLZdvvYYS_vBU52qAf3s3t7b7M0cs1bbeNvb1k 8Fu.Ep7gDwLcoRzE9TcQqGa3HuoksLlpp.Ch9zGGQa_m9PGl.LmXrxjHDmMXNtgb9mdwLBUSyMAC kmUgjKyTEZzcYpG75EVZbWO7PnFwXhIgZcNZFdykhRuq2kzen3yY8cjnYIKzoMlfYrwDZHAynk0x s2EkkRH8d3Im4dsK8UyXaRsBo.2NYU9RCI1P3fFhL2Y1uI3r_1ju_fl5uIAd7AcYvpWTjQUWi9PC IaSChPnygLQZpnejKzHIH.ehSGc9XQxJ1x0To.f.3WkYIEuibE3Ag5qjzL5BXn_w8qX30vgBvh1N voPSMvtXPFMFbrMKKw3KFnBDMyiVVnXxSHmHfBvm0pZDynv9FNDrLvy9LKYJUjtfa5Eb7Gc43jm8 8veQxnYWloXgcfTpvV8WTI3e4DEreNU.KjuPS5cUEs9CbZi0sgvRTYxlvcDcwSeyzq.wVpE7_8mu 5yXBhF_8oxuo4Up85RwUKVOiKEohI4oud.5cQc28iMXBPUVA0C47RZPoOsm4CY5Wb3Tklf0D2j5L NfPxIVZD43wkV3yanZ3gWez2gi1cRwOtPuyxPLm7OSNiNvn36PXx01VZzwF1A0GjIOcbEJM8XY19 .sGsACoGnJxUyurpE1CWviySzsP2mANLi_bTrK..psXkGrPVKKpejZU3rIwUf3zsUb4VsvtmeqKr 5rJsqPhquiVBSnSRRVB98qaNranM_NrVLz1pH2.Bs7f8mfx20hkD7ad5Zlk.b0J4MJlTBKRQxT3n GPlzq5lPjxKjc97j33C4_4ITg4j9DR2_gGtOnDzOT96h1Mc2XiYPjKmppodFSXmrm_k0_3VKHxQF gzja31KqZErY1lNw1s8IO.wWALWObM_cjSJCq85cSALFiZnOYjngdPvncoxtGGSG0JK9ewIcJuN0 8_j0WcqFm6oGHeiLHwN8K4D4jL.kBxCQbLln27.bYS1CfmsMS8c.eRGPqEQxqV2Ra2JI.pDf4KIk 8iiD3_2LcO_p7_bHgJjbzI1Dt1xoCdDSFQHBMHmC0pqRiYoomXPlzcM_GOPclnsy2DdXid8Noam2 uPw3v9bYAADlC4ZNMqTEGjYWzRSfS1IoBm6HZkFauNKVpGUPt.HKNHBvqv_B9vI1AAfahf8jqn2F dboxoSy7URSlBQ9PVf._lyeLDFnQmAZw1n1OgJGdqujpFQHccSgePx6WqJ8Af12rasanO8qBpSOX cFDKkjuF8TizCkKLvi6u9SlICZZbgRkKqVYoToBUzw6RrSytL5AHAulqCOmD8WsocEtXVBv.Ktwh Y.0ZVegPR_ZRsrq5vF8GlH0pjQCJJTsDKMWYJs1ywsuFgc7GX8OpMjna6Slx4Zjv4P.rNLHWD6aQ gMKmL4zIypcg597YgX_EvPZWE5z_NFmyRIGAMVgarP2S6BiNhmRgkc84QNky4B5CfsY0zNL5E0y7 2h7M7a.0.prL9XJa45C5RruS8ZEcOpzHYIIV4C.AFEDWSh6qqx_MfuZVm0cRQI553sR3qAgjKV1v P1LzckclwWWjfnWv76AZVpxuTe_ge0nkLcxm._LgOo3R0EZ068Mw6jXcdaikAVCleMv9gfE0k72j YQrWY5D_B07I.22rkTwZ7plD91CyT_Tj17_MmYmSzDcSeGDm5Tb_GC3V13jw1TyewGtZPPR745JD WrES5ccK_lArK6lasj68eKFIV7c.uWHdSAxjvUP2dFl7zc7dYT6N1rIMh51oV_SI8h36IWDYSwWG WA9IVf1rAyH7wmi_vgTDZ.aehpMLSanJRfLBmrZYIsXRRtY7M91awTpiHrpBhYR3d.oMse_UA8Y1 8enro0o3gxhKK9NQDSqn5.17NZfy4DAwXkIPJ11u3C5JrJoa1dzFEAXCURs5uMsEH985jnXM8ure iwksHqbM04zw9GM6RjGzZAk7TtVCjlO030TXYOdIFCk.OI.b8VP9WHcgZHsORlxbzVwfyk8Am.YT nq5ZNBRM7JEdNUsRWV_oTiWUdzXst5sAWE7l2LA0q9vWfHTcfPMkMQjlg5lZztIYwu1QNU4gQCnn 3ip4jCsILMgYW6fih8hSMcXofSVQBebdzcfbQvrbMiHQRvQCHDfEr9uRD5YCYikGh1qrNmRnq7ZR ernJhSnfpYDGzor8rQCNkofjLFObGFshfCeh87c2FT4ZI.JHUNQyv_Hri2KijnQQo9IwKyyWSp1U gwEytkA-- X-Sonic-MF: X-Sonic-ID: 845b5ec2-8357-4222-b9db-1618602ddecd Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.gq1.yahoo.com with HTTP; Thu, 13 Apr 2023 06:21:38 +0000 Received: by hermes--production-bf1-5f9df5c5c4-wvm2h (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID a481fa1bb45a3a50e0daa9b92e5d42cf; Thu, 13 Apr 2023 06:21:32 +0000 (UTC) Content-Type: text/plain; charset=utf-8 List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 From: Mark Millard In-Reply-To: <20230413055221.E8B211F0@slippy.cwsent.com> Date: Wed, 12 Apr 2023 23:21:19 -0700 Cc: Mateusz Guzik , vishwin@freebsd.org, dev-commits-src-main@freebsd.org, Current FreeBSD Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230413055221.E8B211F0@slippy.cwsent.com> To: Cy Schubert X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Rspamd-Queue-Id: 4PxqGs2GfRz4Wym X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N [This just puts my prior reply's material into Cy's adjusted resend of the original. The To/Cc should be coomplete this time.] On Apr 12, 2023, at 22:52, Cy Schubert = wrote: > In message , Mark = Millard=20 > write > s: >> From: Charlie Li wrote on >> Date: Wed, 12 Apr 2023 20:11:16 UTC : >>=20 >>> Charlie Li wrote: >>>> Mateusz Guzik wrote: >>>>> can you please test poudriere with >>>>> https://github.com/openzfs/zfs/pull/14739/files >>>>>=20 >>>> After applying, on the md(4)-backed pool regardless of =3D >> block_cloning,=3D20 >>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. = Will=3D20=3D >>=20 >>>> report back on poudriere results (no block_cloning). >>>> =3D20 >>> As for poudriere, build failures are still rolling in. These are = (and=3D20=3D >>=20 >>> have been) entirely random on every run. Some examples from this = run: >>> =3D20 >>> lang/php81: >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20 >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D >> ${STAGEDIR}/${PREFIX}/etc >>> - consumers fail to build due to corrupted php.conf packaged >>> =3D20 >>> devel/ninja: >>> - phase: stage >>> - install -s -m 555=3D20 >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20 >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin >>> - consumers fail to build due to corrupted bin/ninja packaged >>> =3D20 >>> devel/netsurf-buildsystem: >>> - phase: stage >>> - mkdir -p=3D20 >>> =3D >> = /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne= =3D >> tsurf-buildsystem/makefiles=3D20 >>> =3D >> = /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne= =3D >> tsurf-buildsystem/testtools >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D >> Makefile.pkgconfig=3D20 >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \ >>> cp makefiles/$M=3D20 >>> =3D >> = /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne= =3D >> tsurf-buildsystem/makefiles/;=3D20 >>> \ >>> done >>> - graphics/libnsgif fails to build due to NUL characters in=3D20 >>> Makefile.{clang,subdir}, causing nothing to link >>=20 >> Summary: I have problems building ports into packages >> via poudriere-devel use despite being fully updated/patched >> (as of when I started the experiment), never having enabled >> block_cloning ( still using openzfs-2.1-freebsd ). >>=20 >> In other words, I can confirm other reports that have >> been made. >>=20 >> The details follow. >>=20 >>=20 >> [Written as I was working on setting up for the experiments >> and then executing those experiments, adjusting as I went >> along.] >>=20 >> I've run my own tests in a context that has never had the >> zpool upgrade and that jump from before the openzfs import to >> after the existing commits for trying to fix openzfs on >> FreeBSD. I report on the sequence of activities getting to >> the point of testing as well. >>=20 >> By personal policy I keep my (non-temporary) pool's compatible >> with what the most recent ??.?-RELEASE supports, using >> openzfs-2.1-freebsd for now. The pools involved below have >> never had a zpool upgrade from where they started. (I've no >> pools that have ever had a zpool upgrade.) >>=20 >> (Temporary pools are rare for me, such as this investigation. >> But I'm not testing block_cloning or anything new this time.) >>=20 >> I'll note that I use zfs for bectl, not for redundancy. So >> my evidence is more limited in that respect. >>=20 >> The activities were done on a HoneyComb (16 Cortex-A72 cores). >> The system has and supports ECC RAM, 64 GiBytes of RAM are >> present. >>=20 >> I started by duplicating my normal zfs environment to an >> external USB3 NVMe drive and adjusting the host name and such >> to produce the below. (Non-debug, although I do not strip >> symbols.) : >>=20 >> # uname -apKU >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =3D >> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =3D >> = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= =3D >> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 >>=20 >> I then did: git fetch, stash push ., merge --ff-only, stash apply . : >> my normal procedure. I then also applied the patch from: >>=20 >> https://github.com/openzfs/zfs/pull/14739/files >>=20 >> Then I did: buildworld buildkernel, install them, and rebooted. >>=20 >> The result was: >>=20 >> # uname -apKU >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =3D >> main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 =3D >> = root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= =3D >> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086 >>=20 >> The later poudriere-devel based build of packages from ports is >> based on: >>=20 >> # ~/fbsd-based-on-what-commit.sh -C /usr/ports >> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =3D >> devel/freebsd-gcc12: Bump to 12.2.0. >> Author: John Baldwin >> Commit: John Baldwin >> CommitDate: 2023-03-25 00:06:40 +0000 >> branch: main >> merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72 >> merge-base: CommitDate: 2023-03-25 00:06:40 +0000 >> n613214 (--first-parent --count for merge-base) >>=20 >> poudriere attempted to build 476 packages, starting >> with pkg (in order to build the 56 that I explicitly >> indicate that I want). It is my normal set of ports. >> The form of building is biased to allowing a high >> load average compared to the number of hardware >> threads (same as cores here): each builder is allowed >> to use the full count of hardware threads. The build >> used USE_TMPFS=3D3D"data" instead of the USE_TMPFS=3D3Dall I >> normally use on the build machine involved. >>=20 >> And it produced some random errors during the attempted >> builds. A type of example that is easy to interpret >> without further exploration is: >>=20 >> pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse = =3D >> error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Za-z) >>=20 >> A fair number of errors are of the form: the build >> installing a previously built package for use in the >> builder but later the builder can not find some file >> from the package's installation. >>=20 >> Another error reported was: >>=20 >> ld: error: /usr/local/lib/libblkid.a: unknown file type >>=20 >> For reference: >>=20 >> [main-CA72-bulk_a-default] [2023-04-12_20h45m32s] [committing:] = Queued: =3D >> 476 Built: 252 Failed: 11 Skipped: 213 Ignored: 0 Fetched: 0 =3D >> Tobuild: 0 Time: 00:37:52 >>=20 >> I started another build that tried to build 224 packeges: >> the 11 failed and 213 skipped. >>=20 >> Just 1 package built that failed before: >>=20 >> [00:04:58] [09] [00:04:15] Finished databases/sqlite3@default | =3D >> sqlite3-3.41.0_1,1: Success >>=20 >> It seems to be the only one where the original failure was not >> an example of complaining about the missing/corrupted content >> of a package install used for building. So it is an example >> of randomly varying behavior. >>=20 >> That, in turn, allowed: >>=20 >> [00:04:58] [01] [00:00:00] Building security/nss | nss-3.89 >>=20 >> to build but everything else failed or was skipped. >>=20 >> The sqlite3 vs. other failure difference suggests that writes >> have random problems but later reads reliably see the problem >> that resulted (before the content is deleted). >>=20 >>=20 >> After the above: >>=20 >> # zpool status >> pool: zroot >> state: ONLINE >> config: >>=20 >> NAME STATE READ WRITE CKSUM >> zroot ONLINE 0 0 0 >> da0p8 ONLINE 0 0 0 >>=20 >> errors: No known data errors >>=20 >> =08=E0=B9=84=C2=8DM # zpool scrub zroot >> # zpool status >> pool: zroot >> state: ONLINE >> scan: scrub repaired 0B in 00:16:25 with 0 errors on Wed Apr 12 =3D >> 22:15:39 2023 >> config: >>=20 >> NAME STATE READ WRITE CKSUM >> zroot ONLINE 0 0 0 >> da0p8 ONLINE 0 0 0 >>=20 >> errors: No known data errors >>=20 >>=20 >> =3D3D=3D3D=3D3D >> Mark Millard >> marklmi at yahoo.com >=20 >=20 > Let's try this again. Claws-mail didn't include the list address in = the=20 > header. Trying to reply, again, using exmh instead. >=20 >=20 > Did your pools suffer the EXDEV problem? The EXDEV also corrupted = files. As I reported, this was a jump from before the import to as things are tonight (here). So: NO, unless the existing code as of tonight still has the EXDEV problem! Prior to this experiment I'd not progressed any media beyond: main-n261544-cee09bda03c8-dirty Wed Mar 15 20:25:49. > I think, without sufficient investigation we risk jumping to > conclusions. I've taken an extremely cautious approach, rolling back > snapshots (as much as possible, i.e. poudriere datasets) when EXDEV > corruption was encountered. Again: nothing between main-n261544-cee09bda03c8-dirty and main-n262122-2ef2c26f3f13-dirty was involved at any stage. >=20 > I did not rollback any snapshots in my MH mail directory. Rolling back > snapshots of my MH maildir would result in loss of email. I have to > live with that corruption. Corrupted files in my outgoing sent email > directory remain: >=20 > slippy$ ugrep -cPa '\x00' ~/.Mail/note | grep -c :1=20 > 53 > slippy$=20 >=20 > There are 53 corrupted files in my note log of 9913 emails. Those = files > will never be fixed. They were corrupted by the EXDEV bug. Any new ZFS > or ZFS patches cannot retroactively remove the corruption from those > files. >=20 > But my poudriere files, because the snapshots were rolled back, were > "repaired" by the rolled back snapshots. >=20 > I'm not convinced that there is presently active corruption since > the problem has been fixed. I am convinced that whatever corruption > that was written at the time will remain forever or until those files > are deleted or replaced -- just like my email files written to disk at > the time. My test results and procedure just do not fit your conclusion that things are okay now if block_clonging is completely avoided. =3D=3D=3D Mark Millard marklmi at yahoo.com