poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

From: Mark Millard <marklmi_at_yahoo.com>
Date: Wed, 23 Aug 2023 21:50:38 UTC
[Forked off the ZFS deadlock 14 discussion, per feedback.]

On Aug 23, 2023, at 11:40, Alexander Motin <mav@FreeBSD.org> wrote:

> On 22.08.2023 14:24, Mark Millard wrote:
>> Alexander Motin <mav_at_FreeBSD.org> wrote on
>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>>> I am waiting for final test results from George Wilson and then will
>>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>>> there are still not many reviewers for the PR, since the code is not
>>> trivial, but at least with the test reports Brian Behlendorf and Mark
>>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>>> have tested and/or reviewed the PR, you may comment on it.
>> I had written to the list that when I tried to test the system
>> doing poudriere builds (initially with your patches) using
>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>> instead got only one builder that ended up active, the others
>> never reaching "Builder started":
> 
>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>> . . .
>>  362     0 root         40    0  27076Ki   13776Ki CPU19   19   4:23   0.00% cpdup -i0 -o ref 32
>>  349     0 root         53    0  27076Ki   13776Ki vlruwk  22   4:20   0.01% cpdup -i0 -o ref 31
>>  328     0 root         68    0  27076Ki   13804Ki vlruwk   8   4:30   0.01% cpdup -i0 -o ref 30
>>  304     0 root         37    0  27076Ki   13792Ki vlruwk   6   4:18   0.01% cpdup -i0 -o ref 29
>>  282     0 root         42    0  33220Ki   13956Ki vlruwk   8   4:33   0.01% cpdup -i0 -o ref 28
>>  242     0 root         56    0  27076Ki   13796Ki vlruwk   4   4:28   0.00% cpdup -i0 -o ref 27
>> . . .
>> But those processes did show CPU?? on occasion, as well as
>> *vnode less often. None of the cpdup's was stuck in
>> Removing your patches did not change the behavior.
> 
> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not deep in that area at least recently, so somebody with more experience there could try to diagnose it.  At very least it does not look related to the ZIL issue discussed in this thread, at least with the information provided, so I am not surprised that the mentioned patches do not affect it.

I did the above intending to test the deadlock in my context but
ended up not getting that far when I tried to make zfs handle all
the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).

The zfs context is a simple single partition on the boot media. I
use ZFS for bectl BE use, not for other typical reasons. The media
here is PCIe Optane 1.4T media. The machine is a ThreadRipper
1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
swap, also on that Optane.

# uname -apKU
FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023     root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1400096 1400096

The GENERIC-DBG variant of the kernel did not report any issues in
earlier testing.

The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
installed from the same build.

# zfs list
NAME                                          USED  AVAIL  REFER  MOUNTPOINT
zoptb                                        79.9G   765G    96K  /zoptb
zoptb/BUILDs                                 20.5G   765G  8.29M  /usr/obj/BUILDs
zoptb/BUILDs/alt-main-amd64-dbg-clang-alt    1.86M   765G  1.86M  /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M  /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
zoptb/BUILDs/main-amd64-dbg-clang            9.96G   765G  9.96G  /usr/obj/BUILDs/main-amd64-dbg-clang
zoptb/BUILDs/main-amd64-dbg-gccxtc           38.5M   765G  38.5M  /usr/obj/BUILDs/main-amd64-dbg-gccxtc
zoptb/BUILDs/main-amd64-nodbg-clang          10.3G   765G  10.3G  /usr/obj/BUILDs/main-amd64-nodbg-clang
zoptb/BUILDs/main-amd64-nodbg-clang-alt      37.2M   765G  37.2M  /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
zoptb/BUILDs/main-amd64-nodbg-gccxtc         94.6M   765G  94.6M  /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
zoptb/DESTDIRs                               4.33G   765G   104K  /usr/obj/DESTDIRs
zoptb/DESTDIRs/main-amd64-poud               2.16G   765G  2.16G  /usr/obj/DESTDIRs/main-amd64-poud
zoptb/DESTDIRs/main-amd64-poud-bulk_a        2.16G   765G  2.16G  /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
zoptb/ROOT                                   13.1G   765G    96K  none
zoptb/ROOT/build_area_for-main-amd64         5.03G   765G  3.24G  none
zoptb/ROOT/main-amd64                        8.04G   765G  3.23G  none
zoptb/poudriere                              6.58G   765G   112K  /usr/local/poudriere
zoptb/poudriere/data                         6.58G   765G   128K  /usr/local/poudriere/data
zoptb/poudriere/data/.m                       112K   765G   112K  /usr/local/poudriere/data/.m
zoptb/poudriere/data/cache                   17.4M   765G  17.4M  /usr/local/poudriere/data/cache
zoptb/poudriere/data/images                    96K   765G    96K  /usr/local/poudriere/data/images
zoptb/poudriere/data/logs                    2.72G   765G  2.72G  /usr/local/poudriere/data/logs
zoptb/poudriere/data/packages                3.84G   765G  3.84G  /usr/local/poudriere/data/packages
zoptb/poudriere/data/wrkdirs                  112K   765G   112K  /usr/local/poudriere/data/wrkdirs
zoptb/poudriere/jails                          96K   765G    96K  /usr/local/poudriere/jails
zoptb/poudriere/ports                          96K   765G    96K  /usr/local/poudriere/ports
zoptb/tmp                                    68.5M   765G  68.5M  /tmp
zoptb/usr                                    35.1G   765G    96K  /usr
zoptb/usr/13_0R-src                          2.64G   765G  2.64G  /usr/13_0R-src
zoptb/usr/alt-main-src                         96K   765G    96K  /usr/alt-main-src
zoptb/usr/home                                181M   765G   181M  /usr/home
zoptb/usr/local                              5.08G   765G  5.08G  /usr/local
zoptb/usr/main-src                            833M   765G   833M  /usr/main-src
zoptb/usr/ports                              26.4G   765G  26.4G  /usr/ports
zoptb/usr/src                                  96K   765G    96K  /usr/src
zoptb/var                                    52.6M   765G    96K  /var
zoptb/var/audit                               356K   765G   356K  /var/audit
zoptb/var/crash                               128K   765G   128K  /var/crash
zoptb/var/db                                 49.7M   765G    96K  /var/db
zoptb/var/db/pkg                             49.4M   765G  49.4M  /var/db/pkg
zoptb/var/db/ports                            164K   765G   164K  /var/db/ports
zoptb/var/log                                1.61M   765G  1.61M  /var/log
zoptb/var/mail                                632K   765G   632K  /var/mail
zoptb/var/tmp                                 128K   765G   128K  /var/tmp

# poudriere jail -jmain-amd64-bulk_a -i
Jail name:         main-amd64-bulk_a
Jail version:      14.0-ALPHA2
Jail arch:         amd64
Jail method:       null
Jail mount:        /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
Jail fs:           
Jail updated:      2021-12-04 14:55:22
Jail pkgbase:      disabled



So, setting up another test with some related information
shown before, during, and after. sysctl output is from
another ssh session than the bulk -a run.

# sysctl -a | grep vnode
kern.maxvnodes: 2213808
kern.ipc.umtx_vnode_persistent: 0
kern.minvnodes: 553452
vm.vnode_pbufs: 2048
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 272429
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 12461
vfs.vnode_alloc_sleeps: 0
vfs.wantfreevnodes: 553452
vfs.freevnodes: 962766
vfs.vnodes_created: 2538980
vfs.numvnodes: 1056233
vfs.cache.debug.vnodes_cel_3_failures: 0
vfs.cache.stats.heldvnodes: 91878
debug.vnode_domainset: <NULL>
debug.sizeof.vnode: 448
debug.fail_point.status_fill_kinfo_vnode__random_path: off
debug.fail_point.fill_kinfo_vnode__random_path: off

# poudriere bulk -jmain-amd64-bulk_a -a
. . .
[00:01:34] Building 34042 packages using up to 32 builders
[00:01:34] Hit CTRL+t at any time to see build progress and stats
[00:01:34] [01] [00:00:00] Builder starting
[00:01:57] [01] [00:00:23] Builder started
[00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4
[00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: Success
[00:03:22] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
[00:03:22] [02] [00:00:00] Builder starting
[00:03:22] [03] [00:00:00] Builder starting
. . .
[00:03:22] [31] [00:00:00] Builder starting
[00:03:22] [32] [00:00:00] Builder starting
[00:03:31] [01] [00:00:09] Finished print/indexinfo | indexinfo-0.3.1: Success
[00:03:31] [01] [00:00:00] Building devel/gettext-runtime | gettext-runtime-0.22
. . .

Note that only [01] makes progress: no new "Builder started"
notices occur. top shows 31 of the pattern:
cpdup -i0 -o ref ??

Then during the 31 cpudup's showing vlruwk most of the time:

# sysctl -a | grep vnode
kern.maxvnodes: 2213808
kern.ipc.umtx_vnode_persistent: 0
kern.minvnodes: 553452
vm.vnode_pbufs: 2048
vm.stats.vm.v_vnodepgsout: 22844
vm.stats.vm.v_vnodepgsin: 582398
vm.stats.vm.v_vnodeout: 890
vm.stats.vm.v_vnodein: 34296
vfs.vnode_alloc_sleeps: 2994
vfs.wantfreevnodes: 553452
vfs.freevnodes: 2209662
vfs.vnodes_created: 12206299
vfs.numvnodes: 2214071
vfs.cache.debug.vnodes_cel_3_failures: 0
vfs.cache.stats.heldvnodes: 459
debug.vnode_domainset: <NULL>
debug.sizeof.vnode: 448
debug.fail_point.status_fill_kinfo_vnode__random_path: off
debug.fail_point.fill_kinfo_vnode__random_path: off

Wait a while but still the mostly cpdup vlruwk status:

# sysctl -a | grep vnode
kern.maxvnodes: 2213808
kern.ipc.umtx_vnode_persistent: 0
kern.minvnodes: 553452
vm.vnode_pbufs: 2048
vm.stats.vm.v_vnodepgsout: 22844
vm.stats.vm.v_vnodepgsin: 583527
vm.stats.vm.v_vnodeout: 890
vm.stats.vm.v_vnodein: 34396
vfs.vnode_alloc_sleeps: 8053
vfs.wantfreevnodes: 553452
vfs.freevnodes: 2210166
vfs.vnodes_created: 12212061
vfs.numvnodes: 2215106
vfs.cache.debug.vnodes_cel_3_failures: 0
vfs.cache.stats.heldvnodes: 497
debug.vnode_domainset: <NULL>
debug.sizeof.vnode: 448
debug.fail_point.status_fill_kinfo_vnode__random_path: off
debug.fail_point.fill_kinfo_vnode__random_path: off

^C[00:14:55] Error: Signal SIGINT caught, cleaning up and exiting

# sysctl -a | grep vnode
kern.maxvnodes: 2213808
kern.ipc.umtx_vnode_persistent: 0
kern.minvnodes: 553452
vm.vnode_pbufs: 2048
vm.stats.vm.v_vnodepgsout: 22844
vm.stats.vm.v_vnodepgsin: 584474
vm.stats.vm.v_vnodeout: 890
vm.stats.vm.v_vnodein: 34591
vfs.vnode_alloc_sleeps: 17584
vfs.wantfreevnodes: 553452
vfs.freevnodes: 2210796
vfs.vnodes_created: 12222343
vfs.numvnodes: 2216564
vfs.cache.debug.vnodes_cel_3_failures: 0
vfs.cache.stats.heldvnodes: 539
debug.vnode_domainset: <NULL>
debug.sizeof.vnode: 448
debug.fail_point.status_fill_kinfo_vnode__random_path: off
debug.fail_point.fill_kinfo_vnode__random_path: off

[main-amd64-bulk_a-default] [2023-08-23_13h58m08s] [sigint:] Queued: 34435 Built: 2     Failed: 0     Skipped: 35    Ignored: 358   Fetched: 0     Tobuild: 34040  Time: 00:14:36
[00:16:13] Logs: /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-08-23_13h58m08s
[00:16:49] Cleaning up
load: 5.28  cmd: sh 77057 [vlruwk] 141.63r 0.00u 30.98s 28% 6932k
#0 0xffffffff80b76ebb at mi_switch+0xbb
#1 0xffffffff80bc960f at sleepq_timedwait+0x2f
#2 0xffffffff80b76610 at _sleep+0x1d0
#3 0xffffffff80c5b2dc at vn_alloc_hard+0x2ac
#4 0xffffffff80c50a12 at getnewvnode_reserve+0x92
#5 0xffffffff829afb12 at zfs_zget+0x22
#6 0xffffffff8299ca8d at zfs_dirent_lookup+0x16d
#7 0xffffffff8299cb5f at zfs_dirlook+0x7f
#8 0xffffffff829ac410 at zfs_lookup+0x350
#9 0xffffffff829a782a at zfs_freebsd_cachedlookup+0x6a
#10 0xffffffff80c368ad at vfs_cache_lookup+0xad
#11 0xffffffff80c3b6d8 at cache_fplookup_final_modifying+0x188
#12 0xffffffff80c38766 at cache_fplookup+0x356
#13 0xffffffff80c43fb2 at namei+0x112
#14 0xffffffff80c62e5b at kern_funlinkat+0x13b
#15 0xffffffff80c62d18 at sys_unlink+0x28
#16 0xffffffff83b8e583 at filemon_wrapper_unlink+0x13
#17 0xffffffff81049a79 at amd64_syscall+0x109

[00:26:28] Unmounting file systems
Exiting with status 1

# sysctl -a | grep vnode
kern.maxvnodes: 2213808
kern.ipc.umtx_vnode_persistent: 0
kern.minvnodes: 553452
vm.vnode_pbufs: 2048
vm.stats.vm.v_vnodepgsout: 22844
vm.stats.vm.v_vnodepgsin: 585384
vm.stats.vm.v_vnodeout: 890
vm.stats.vm.v_vnodein: 34798
vfs.vnode_alloc_sleeps: 27578
vfs.wantfreevnodes: 553452
vfs.freevnodes: 61362
vfs.vnodes_created: 20135479
vfs.numvnodes: 59860
vfs.cache.debug.vnodes_cel_3_failures: 0
vfs.cache.stats.heldvnodes: 208
debug.vnode_domainset: <NULL>
debug.sizeof.vnode: 448
debug.fail_point.status_fill_kinfo_vnode__random_path: off
debug.fail_point.fill_kinfo_vnode__random_path: off


For reference (from after):

# kldstat
Id Refs Address                Size Name
 1   95 0xffffffff80200000  274b308 kernel
 2    1 0xffffffff8294c000   5d5238 zfs.ko
 3    1 0xffffffff82f22000     7718 cryptodev.ko
 4    1 0xffffffff83b10000     3390 acpi_wmi.ko
 5    1 0xffffffff83b14000     3220 intpm.ko
 6    1 0xffffffff83b18000     2178 smbus.ko
 7    1 0xffffffff83b1b000     2240 cpuctl.ko
 8    1 0xffffffff83b1e000     3360 uhid.ko
 9    1 0xffffffff83b22000     4364 ums.ko
10    1 0xffffffff83b27000     33c0 usbhid.ko
11    1 0xffffffff83b2b000     3380 hidbus.ko
12    1 0xffffffff83b2f000     4d20 ng_ubt.ko
13    6 0xffffffff83b34000     abb8 netgraph.ko
14    2 0xffffffff83b3f000     a250 ng_hci.ko
15    4 0xffffffff83b4a000     2670 ng_bluetooth.ko
16    1 0xffffffff83b4d000     83a0 uftdi.ko
17    1 0xffffffff83b56000     4e58 ucom.ko
18    1 0xffffffff83b5b000     3360 wmt.ko
19    1 0xffffffff83b5f000     e268 ng_l2cap.ko
20    1 0xffffffff83b6e000    1bf68 ng_btsocket.ko
21    1 0xffffffff83b8a000     38f8 ng_socket.ko
22    1 0xffffffff83b8e000     3250 filemon.ko
23    1 0xffffffff83b92000     4758 nullfs.ko
24    1 0xffffffff83b97000     73c0 linprocfs.ko
25    3 0xffffffff83b9f000     be70 linux_common.ko
26    1 0xffffffff83bab000     3558 fdescfs.ko
27    1 0xffffffff83baf000    31b20 linux.ko
28    1 0xffffffff83be1000    2ed40 linux64.ko


Note that before the "Cleaning up" notice, the vfs.freevnodes
shows as being around (for example) 2210796. But after
"Exiting with status": 61362. vfs.vnodes_created has a
similar staging of in the ball park of up to 12222343 but
then the change to: 20135479. Similarly, vfs.numvnodes
2216564 -> 59860.



Anything else I should gather and report as basic information?

===
Mark Millard
marklmi at yahoo.com