Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]
Date: Thu, 28 Nov 2024 18:16:16 UTC
On Mon, 25 Nov 2024, Mark Millard wrote: > On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote: > >> Top posting going in a different direction that >> established a way to control the behavior in my >> context . . . > > For folks new to the discoveries: the context here > is poudriere bulk builds, for USE_TMPFS=all vs. > USE_TMPFS=no . My test context is amd64 on a > 7950X3D system with 192 GiBytes of RAM. Others have > other contexts, including an Intel system. I have been seeing some odd behavior from Firefox as well as with poudriere builds on my system. Both of which are touching a tmpfs system as I have setup /tmp as tmpfs, which Firefox uses, and USE_TMPFS=all. The system has been an experiment, for me, with undervolting. I have been attributing any flakiness to the undervolting, but I have reduced that a lot while the instability has been consistent as in it has stayed rare. I cannot tell how many times I have run memtest86 on this system. System setup: - FreeBSD 14.2-STABLE - i7-14700K (latest BIOS which *should* fix Intel power-related bugs) - 128 GiB RAM - ZFS (mirrored drives) - 2 encrypted swap partitions (64 GiB each, lightly used) - Lightly undervolted (-0.06 offset to Global Core SVID Voltage) - /tmp is tmpfs - ${HOME}/.cache is tmpfs - Poudriere: - USE_TMPFS=all - ccache - jail version in sync with host - /usr/ports is mounted with nullfs I have wondered if it was swap-related, but recently I noticed a build failure with games/veloren-weekly where swap was available but zero bytes were used. The system was under little load at the time so less chance of undervolting being an issue. Build failure: ----------------------------- portpicker = { path = '/wrkdirs/usr/ports/games/veloren-weekly/work/portpicker-rs-df6b37872f3586ac3b21d08b56c8ec7cd92fb172' } ===> Updating Cargo.lock error: checksum for `windows_x86_64_msvc v0.42.2` changed between lock files this could be indicative of a few possible errors: * the lock file is corrupt * a replacement source in use (e.g., a mirror) returned a different checksum * the source itself may be corrupt in one way or another unable to verify that `windows_x86_64_msvc v0.42.2` is the same as when the lockfile was generated *** Error code 101 ----------------------------- Restarting the build finished successfully. >> I changed USE_TMPFS=all to USE_TMPFS=no : >> >> USE_TMPFS=all gets the failure *snip* >> vs. >> USE_TMPFS=no works just fine >> >> So it is a FreeBSD system error associated with >> use of tmpfs . > > Recent work on tmpfs includes: > > Mon, 09 Sep 2024 > • git: 8fa5e0f21fd1 - main - tmpfs: Account for whiteouts during rename/rmdir Jason A. Harmening > Fri, 04 Oct 2024 > • git: 75734c4360fc - main - tmpfs: check residence in data_locked Doug Moore > Sun, 13 Oct 2024 > • git: ec22e705c266 - main - tmpfs: remove duplicate flags check in tmpfs_rmdir Alan Somers > Thu, 24 Oct 2024 > • git: db08b0b04dec - main - tmpfs_vnops: move swap work to swap_pager Doug Moore > > swap_pager (given the reference to it above): > > Tue, 08 Oct 2024 > • git: d0b225d16418 - main - swap_pager: use iterators in swp_pager_meta_build Doug Moore > Fri, 11 Oct 2024 > • git: 1107834090be - main - swap_pager: swapoff detecting object death Doug Moore > Thu, 24 Oct 2024 > • git: 34951b0b9e78 - main - swap_pager: move scan_all_shadowed, use iterators Doug Moore > • git: 02e85d1c8a41 - main - swap_pager: fix assert in seek_data Doug Moore > • git: faa9356f97d2 - main - swap_pager: fix seek_hole assert Doug Moore > Sat, 26 Oct 2024 > • git: 39f6d1e7f835 - main - swap_pager: iter in haspage, lookup, getpages Doug Moore > Wed, 13 Nov 2024 > • git: d11d407aee48 - main - swap_pager: Ensure that swapoff puts swapped-in pages in page queues Mark Johnston > > I do not know at this time when the corruptions started. The > above is only suggestive. Thank you for listing those. I need to find some time to look over those changes although I am no kernel guru by a long shot. However, I see now that it looks like much more knowledgeable people are already looking on the current mailing list at the issue. Sean -- scf@FreeBSD.org