Re: panic: nfsv4root ref cnt cpuid = 1

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Sun, 22 Sep 2024 23:22:18 UTC
On Sun, Sep 22, 2024 at 7:28 AM J David <j.david.lists@gmail.com> wrote:
>
> On Sun, Sep 22, 2024 at 10:17 AM J David <j.david.lists@gmail.com> wrote:
> > #8 0xffffffff8302c3a7 at null_lookup+0xc7
>
> After noticing null_lookup in the crash trace, I realized that this
> must be an nfs filesystem that is then remounted elsewhere via nullfs.
> We've eliminated most of those.
>
> There's only one filesystem that is still used with nullfs
> (specifically to avoid a large number of otherwise identical mounts).
> Here are the "nfsstat -m" mount flags for that filesystem:
>
> nfsv4,minorversion=2,oneopenown,tcp,resvport,nconnect=1,hard,cto,nolockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647

I think I know what causes the crashes. The attached trivial patch should
work around them, but if you cannot apply a source kernel patch, the
only workaround would be to get rid of "oneopenown".
(Using nullfs may be a factor, since I think the crash would occur when
the code sleeps for a lock used to serialize opens for oneopenown.
This could result in the "struc nfsclopen *" being bogus, since the mutex
would be released/re-acquired.)

If you cannot get rid of the "oneopenown" or apply the kernel source patch,
getting rid of the nullfs mount or enabling delegations might also work around
this.

I will need to work on a correct fix, but it wouldn't make it into an update
for quite a while.

Sorry about the breakage, rick

>
> The server of this filesystem is 14.1-RELEASE-p5 and the exported
> filesystem is a readonly ZFS dataset.
>
> Thanks!
>