Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 13 Apr 2023 06:11:08 UTC
From: Cy Schubert <Cy.Schubert_at_cschubert.com> wrote on Date: Thu, 13 Apr 2023 05:47:33 UTC : > On Wed, 12 Apr 2023 22:28:13 -0700 > Mark Millard <marklmi@yahoo.com> wrote: > > > From: Charlie Li <vishwin_at_freebsd.org> wrote on > > Date: Wed, 12 Apr 2023 20:11:16 UTC : > > > > > Charlie Li wrote: > > > > Mateusz Guzik wrote: > > > >> can you please test poudriere with > > > >> https://github.com/openzfs/zfs/pull/14739/files > > > >> > > > > After applying, on the md(4)-backed pool regardless of block_cloning, > > > > the cy@ `cp -R` test reports no differing (ie corrupted) files. Will > > > > report back on poudriere results (no block_cloning). > > > > > > > As for poudriere, build failures are still rolling in. These are (and > > > have been) entirely random on every run. Some examples from this run: > > > > > > lang/php81: > > > - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development > > > ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf ${STAGEDIR}/${PREFIX}/etc > > > - consumers fail to build due to corrupted php.conf packaged > > > > > > devel/ninja: > > > - phase: stage > > > - install -s -m 555 > > > /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja > > > /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin > > > - consumers fail to build due to corrupted bin/ninja packaged > > > > > > devel/netsurf-buildsystem: > > > - phase: stage > > > - mkdir -p > > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles > > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/testtools > > > for M in Makefile.top Makefile.tools Makefile.subdir Makefile.pkgconfig > > > Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \ > > > cp makefiles/$M > > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles/; > > > \ > > > done > > > - graphics/libnsgif fails to build due to NUL characters in > > > Makefile.{clang,subdir}, causing nothing to link > > > > Summary: I have problems building ports into packages > > via poudriere-devel use despite being fully updated/patched > > (as of when I started the experiment), never having enabled > > block_cloning ( still using openzfs-2.1-freebsd ). > > > > In other words, I can confirm other reports that have > > been made. > > > > The details follow. > > > > > > [Written as I was working on setting up for the experiments > > and then executing those experiments, adjusting as I went > > along.] > > > > I've run my own tests in a context that has never had the > > zpool upgrade and that jump from before the openzfs import to > > after the existing commits for trying to fix openzfs on > > FreeBSD. I report on the sequence of activities getting to > > the point of testing as well. > > > > By personal policy I keep my (non-temporary) pool's compatible > > with what the most recent ??.?-RELEASE supports, using > > openzfs-2.1-freebsd for now. The pools involved below have > > never had a zpool upgrade from where they started. (I've no > > pools that have ever had a zpool upgrade.) > > > > (Temporary pools are rare for me, such as this investigation. > > But I'm not testing block_cloning or anything new this time.) > > > > I'll note that I use zfs for bectl, not for redundancy. So > > my evidence is more limited in that respect. > > > > The activities were done on a HoneyComb (16 Cortex-A72 cores). > > The system has and supports ECC RAM, 64 GiBytes of RAM are > > present. > > > > I started by duplicating my normal zfs environment to an > > external USB3 NVMe drive and adjusting the host name and such > > to produce the below. (Non-debug, although I do not strip > > symbols.) : > > > > # uname -apKU > > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 > > > > I then did: git fetch, stash push ., merge --ff-only, stash apply . : > > my normal procedure. I then also applied the patch from: > > > > https://github.com/openzfs/zfs/pull/14739/files > > > > Then I did: buildworld buildkernel, install them, and rebooted. > > > > The result was: > > > > # uname -apKU > > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086 > > > > The later poudriere-devel based build of packages from ports is > > based on: > > > > # ~/fbsd-based-on-what-commit.sh -C /usr/ports > > 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) devel/freebsd-gcc12: Bump to 12.2.0. > > Author: John Baldwin <jhb@FreeBSD.org> > > Commit: John Baldwin <jhb@FreeBSD.org> > > CommitDate: 2023-03-25 00:06:40 +0000 > > branch: main > > merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72 > > merge-base: CommitDate: 2023-03-25 00:06:40 +0000 > > n613214 (--first-parent --count for merge-base) > > > > poudriere attempted to build 476 packages, starting > > with pkg (in order to build the 56 that I explicitly > > indicate that I want). It is my normal set of ports. > > The form of building is biased to allowing a high > > load average compared to the number of hardware > > threads (same as cores here): each builder is allowed > > to use the full count of hardware threads. The build > > used USE_TMPFS="data" instead of the USE_TMPFS=all I > > normally use on the build machine involved. > > > > And it produced some random errors during the attempted > > builds. A type of example that is easy to interpret > > without further exploration is: > > > > pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Za-z) > > > > A fair number of errors are of the form: the build > > installing a previously built package for use in the > > builder but later the builder can not find some file > > from the package's installation. > > > > Another error reported was: > > > > ld: error: /usr/local/lib/libblkid.a: unknown file type > > > > For reference: > > > > [main-CA72-bulk_a-default] [2023-04-12_20h45m32s] [committing:] Queued: 476 Built: 252 Failed: 11 Skipped: 213 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 00:37:52 > > > > I started another build that tried to build 224 packeges: > > the 11 failed and 213 skipped. > > > > Just 1 package built that failed before: > > > > [00:04:58] [09] [00:04:15] Finished databases/sqlite3@default | sqlite3-3.41.0_1,1: Success > > > > It seems to be the only one where the original failure was not > > an example of complaining about the missing/corrupted content > > of a package install used for building. So it is an example > > of randomly varying behavior. > > > > That, in turn, allowed: > > > > [00:04:58] [01] [00:00:00] Building security/nss | nss-3.89 > > > > to build but everything else failed or was skipped. > > > > The sqlite3 vs. other failure difference suggests that writes > > have random problems but later reads reliably see the problem > > that resulted (before the content is deleted). > > > > > > After the above: > > > > # zpool status > > pool: zroot > > state: ONLINE > > config: > > > > NAME STATE READ WRITE CKSUM > > zroot ONLINE 0 0 0 > > da0p8 ONLINE 0 0 0 > > > > errors: No known data errors > > > > # zpool scrub zroot > > # zpool status > > pool: zroot > > state: ONLINE > > scan: scrub repaired 0B in 00:16:25 with 0 errors on Wed Apr 12 22:15:39 2023 > > config: > > > > NAME STATE READ WRITE CKSUM > > zroot ONLINE 0 0 0 > > da0p8 ONLINE 0 0 0 > > > > errors: No known data errors > > > > > > === > > Mark Millard > > marklmi at yahoo.com > > Did your pools suffer the EXDEV problem? The EXDEV also corrupted files. As I reported, this was a jump from before the import to as things are tonight (here). So: NO, unless the existing code as of tonight still has the EXDEV problem! Prior to this experiment I'd not progressed any media beyond: main-n261544-cee09bda03c8-dirty Wed Mar 15 20:25:49. > I think, without sufficient investigation we risk jumping to > conclusions. I've taken an extremely cautious approach, rolling back > snapshots (as much as possible, i.e. poudriere datasets) when EXDEV > corruption was encountered. Again: nothing between main-n261544-cee09bda03c8-dirty and main-n262122-2ef2c26f3f13-dirty was involved at any stage. > I did not rollback any snapshots in my MH mail directory. Rolling back > snapshots of my MH maildir would result in loss of email. I have to > live with that corruption. Corrupted files in my outgoing sent email > directory remain: > > slippy$ ugrep -cPa '\x00' ~/.Mail/note | grep -c :1 > 53 > slippy$ > > There are 53 corrupted files in my note log of 9913 emails. Those files > will never be fixed. They were corrupted by the EXDEV bug. Any new ZFS > or ZFS patches cannot retroactively remove the corruption from those > files. > > But my poudriere files, because the snapshots were rolled back, were > "repaired" by the rolled back snapshots. > > I'm not convinced that there is presently active corruption since > the problem has been fixed. I am convinced that whatever corruption > that was written at the time will remain forever or until those files > are deleted or replaced -- just like my email files written to disk at > the time. My test results and procedure just do not fit your conclusion that things are okay now if block_clonging is completely avoided. === Mark Millard marklmi at yahoo.com