Re: panic: nfsv4root ref cnt cpuid = 1
- In reply to: Rick Macklem : "Re: panic: nfsv4root ref cnt cpuid = 1"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 27 Sep 2024 15:33:15 UTC
Circling back around to whether it's better to NFS mount once and nullfs mount lots, or NFS mount lots, I've unfortunately gathered some additional data. We set up a version of our code that mounts the requisite NFS filesystem directly for each job/jail root. That worked fine in small-scale testing. In a wider deployment, however, disaster ensued. With a few thousand mounts, we started to observe two separate forms of bad behavior: - requests from established sessions would hang indefinitely leading to processes backlogging and client machines going OOM and becoming unresponsive en masse. - the NFS server appeared to be serving empty directories. The first one is self-explanatory. The second one might bear further explanation. The server runs ZFS. There are several datasets that contain job roots. E.g.: tank tank/roots tank/roots/a tank/roots/b tank/roots/c tank/roots/d The /etc/exports looks like: V4: /tank -sec=sys For client machines using nullfs, there is an /etc/fstab line like: fs:/roots /roots nfs ro,nfsv4,minorversion=2,tcp,nosuid,noatime,nolockd,noresvport,oneopenown 0 0 Under ordinary operation, NFSv4 exports the child datasets correctly. E.g.: $ ls /roots/a bin etc lib net proc sbin usr dev home libexec root tmp var Then a client does: # for a "Type A" job /sbin/mount_nullfs -o ro -o nosuid /roots/a /jobs/(job-uuid) During the failure, I observed: $ ls /roots a b c d $ ls /roots/a $ ls /roots/b $ ls /roots/c $ ls /roots/d I.e., the server appeared to have "forgotten" to descend into the child datasets and behaved as NFSv3 would have done in that situation. The server in question is FreeBSD 14.1-RELEASE-r5. There were no console diagnostics, nothing in dmesg, and negligible visible load (load average below 1.0, nfsd using ~7% of one CPU). The individual client mounts (the ones that were hanging) were a little different, because they would go straight to the subdirectory they want: # for a "Type A" job /sbin/mount_nfs -o tcp,nfsv4,minorversion=2,noatime -o ro -o nosuid -o noresvport fs:/roots/a /jobs/(job-uuid) Once all the client machines were restarted in "nullfs mode" the server returned to normal operation without further intervention, so the server behavior does appear directly related to the number of client NFS mounts. I couldn't exactly measure it at the time of the incident, but I would ballpark it at about 5,000 +/- 2,000 NFS mounts across 28 client machines. FWIW, during the ~48 hour window where we were testing direct NFS instead of nullfs on slowly increasing numbers of machines, no client using direct NFS experienced the kernel panic we're discussing here. (That's without the patch.) Contrast that to 2-3 total panics per day among the machines using nullfs. So it's possible that indirection through nullfs aggravates that particular bug. Alas, based on the above, nullfs seems to be necessary for now. Getting the patch tested & deployed is now top of my list. Thanks!