Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
Date: Thu, 24 Aug 2023 20:07:22 UTC
On Aug 24, 2023, at 00:22, Mark Millard <marklmi@yahoo.com> wrote: > On Aug 23, 2023, at 22:54, Mateusz Guzik <mjguzik@gmail.com> wrote: > >> On 8/24/23, Mark Millard <marklmi@yahoo.com> wrote: >>> On Aug 23, 2023, at 15:10, Mateusz Guzik <mjguzik@gmail.com> wrote: >>> >>>> On 8/23/23, Mark Millard <marklmi@yahoo.com> wrote: >>>>> [Forked off the ZFS deadlock 14 discussion, per feedback.] >>>>> . . . >>>> >>>> This is a known problem, but it is unclear if you should be running >>>> into it in this setup. >>> >>> The changed fixed the issue: so I do run into the the issue >>> for this setup. See below. >>> >>>> Can you try again but this time *revert* >>>> 138a5dafba312ff39ce0eefdbe34de95519e600d, like so: >>>> git revert 138a5dafba312ff39ce0eefdbe34de95519e600d >>>> >>>> may want to switch to a different branch first, for example: git >>>> checkout -b vfstesting >>> >>> # git -C /usr/main-src/ diff sys/kern/vfs_subr.c >>> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c >>> index 0f3f00abfd4a..5dff556ac258 100644 >>> --- a/sys/kern/vfs_subr.c >>> +++ b/sys/kern/vfs_subr.c >>> @@ -3528,25 +3528,17 @@ vdbatch_process(struct vdbatch *vd) >>> MPASS(curthread->td_pinned > 0); >>> MPASS(vd->index == VDBATCH_SIZE); >>> + mtx_lock(&vnode_list_mtx); >>> critical_enter(); >>> - if (mtx_trylock(&vnode_list_mtx)) { >>> - for (i = 0; i < VDBATCH_SIZE; i++) { >>> - vp = vd->tab[i]; >>> - vd->tab[i] = NULL; >>> - TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); >>> - TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); >>> - MPASS(vp->v_dbatchcpu != NOCPU); >>> - vp->v_dbatchcpu = NOCPU; >>> - } >>> - mtx_unlock(&vnode_list_mtx); >>> - } else { >>> - for (i = 0; i < VDBATCH_SIZE; i++) { >>> - vp = vd->tab[i]; >>> - vd->tab[i] = NULL; >>> - MPASS(vp->v_dbatchcpu != NOCPU); >>> - vp->v_dbatchcpu = NOCPU; >>> - } >>> + for (i = 0; i < VDBATCH_SIZE; i++) { >>> + vp = vd->tab[i]; >>> + TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); >>> + TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); >>> + MPASS(vp->v_dbatchcpu != NOCPU); >>> + vp->v_dbatchcpu = NOCPU; >>> } >>> + mtx_unlock(&vnode_list_mtx); >>> + bzero(vd->tab, sizeof(vd->tab)); >>> vd->index = 0; >>> critical_exit(); >>> } >>> >>> Still with: >>> >>> # grep USE_TMPFS= /usr/local/etc/poudriere.conf >>> # EXAMPLE: USE_TMPFS="wrkdir data" >>> #USE_TMPFS=all >>> #USE_TMPFS="data" >>> USE_TMPFS=no >>> >>> >>> That allowed the other builders to eventually reach "Builder started" >>> and later activity, [00:05:50] [27] [00:02:29] Builder started >>> being the first non-[01] to do so, no vlruwk's observed in what >>> I saw in top: >>> >>> . . . >>> >>> Now testing for the zfs deadlock issue should be possible for >>> this setup. >>> >> >> Thanks for testing, I wrote a fix: >> >> https://people.freebsd.org/~mjg/vfs-recycle-fix.diff >> >> Applies to *stock* kernel (as in without the revert). > > I'm going to leave the deadlock test running for when > I sleep tonight. So it is going to be a while before > I get to testing this. $ work will likely happen first > as well. (No deadlock observed yet, by the way. 6+ hrs > and 3000+ ports built so far.) > > I can easily restore the sys/kern/vfs_subr.c to then > do normal 14.0-ALPHA2-ish based patching with: so not > a problem. Thanks. > I stopped the deadlock experiment, cleaned out the partial bulk -a, put back the modern sys/kern/vfs_subr.c , applied your patch, built, installed, rebooted, and started another bulk -a run. It made progress on all the builders to and past "Builder started": . . . [00:01:34] Building 34042 packages using up to 32 builders [00:01:34] Hit CTRL+t at any time to see build progress and stats [00:01:34] [01] [00:00:00] Builder starting [00:01:57] [01] [00:00:23] Builder started [00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4 [00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: Success [00:03:22] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1 [00:03:22] [02] [00:00:00] Builder starting [00:03:22] [03] [00:00:00] Builder starting [00:03:22] [04] [00:00:00] Builder starting [00:03:22] [05] [00:00:00] Builder starting [00:03:22] [06] [00:00:00] Builder starting [00:03:22] [07] [00:00:00] Builder starting [00:03:22] [08] [00:00:00] Builder starting [00:03:22] [09] [00:00:00] Builder starting [00:03:22] [10] [00:00:00] Builder starting [00:03:22] [11] [00:00:00] Builder starting [00:03:22] [12] [00:00:00] Builder starting [00:03:22] [13] [00:00:00] Builder starting [00:03:22] [14] [00:00:00] Builder starting [00:03:22] [15] [00:00:00] Builder starting [00:03:22] [16] [00:00:00] Builder starting [00:03:22] [17] [00:00:00] Builder starting [00:03:22] [18] [00:00:00] Builder starting [00:03:22] [19] [00:00:00] Builder starting [00:03:22] [20] [00:00:00] Builder starting [00:03:22] [21] [00:00:00] Builder starting [00:03:22] [22] [00:00:00] Builder starting [00:03:22] [23] [00:00:00] Builder starting [00:03:22] [24] [00:00:00] Builder starting [00:03:22] [25] [00:00:00] Builder starting [00:03:22] [26] [00:00:00] Builder starting [00:03:22] [27] [00:00:00] Builder starting [00:03:22] [28] [00:00:00] Builder starting [00:03:22] [29] [00:00:00] Builder starting [00:03:22] [30] [00:00:00] Builder starting [00:03:22] [31] [00:00:00] Builder starting [00:03:22] [32] [00:00:00] Builder starting [00:03:30] [01] [00:00:08] Finished print/indexinfo | indexinfo-0.3.1: Success [00:03:30] [01] [00:00:00] Building devel/gettext-runtime | gettext-runtime-0.22 [00:04:42] [01] [00:01:12] Finished devel/gettext-runtime | gettext-runtime-0.22: Success [00:04:48] [01] [00:00:00] Building devel/libtextstyle | libtextstyle-0.22 [00:05:46] [19] [00:02:24] Builder started [00:05:46] [15] [00:02:24] Builder started [00:05:46] [19] [00:00:00] Building graphics/libpotrace | libpotrace-1.16 [00:05:46] [15] [00:00:00] Building devel/libdaemon | libdaemon-0.14_1 [00:05:46] [25] [00:02:24] Builder started [00:05:46] [25] [00:00:00] Building audio/speexdsp | speexdsp-1.2.1 [00:05:46] [29] [00:02:24] Builder started [00:05:46] [29] [00:00:00] Building devel/opencl | opencl-3.0.14 . . . Thanks. I'll let it run as another deadlock test. The prior run built over 9400 in about 18.5 hr before I stopped it (no deadlocks observed). === Mark Millard marklmi at yahoo.com