Re: NFSv4 crash of CURRENT

From: FreeBSD User <freebsd_at_walstatt-de.de>
Date: Mon, 15 Jan 2024 16:36:24 UTC
Am Mon, 15 Jan 2024 16:59:07 +0100
Peter Blok <pblok@bsd4all.org> schrieb:

> Rick,
> 
> I can confirm Kostik’s fix works on 13-stable.
> 
> Peter

Me, too.
The patch fixed the reported problem.

Thank you very much.

oh

> 
> > On 15 Jan 2024, at 16:13, Peter Blok <pblok@bsd4all.org> wrote:
> > 
> > I can give it a shot on one of my clients.
> >   
> >> On 15 Jan 2024, at 16:04, Rick Macklem <rick.macklem@gmail.com
> >> <mailto:rick.macklem@gmail.com>> wrote:
> >> 
> >> On Mon, Jan 15, 2024 at 2:53 AM Peter Blok <pblok@bsd4all.org <mailto:pblok@bsd4all.org>>
> >> wrote:  
> >>> 
> >>> Hi,
> >>> 
> >>> Forgot to mention I’m on 13-stable. The fix that is causing the crash with automounted
> >>> NFS is:
> >>> 
> >>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
> >>> Author: Konstantin Belousov <kib@FreeBSD.org <mailto:kib@FreeBSD.org>>
> >>> Date:   Tue Jan 2 00:22:44 2024 +0200
> >>> 
> >>>    nfsclient: limit situations when we do unlocked read-ahead by nfsiod
> >>> 
> >>>    (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
> >>> 
> >>> When I remove the fix, the problem is gone. Add it back and the crash happens.  
> >> Kostik has already come up with a probable fix. If you want it right
> >> away, here it is,
> >> but he'll probably commit it soon anyhow:
> >> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c
> >> index c027d7d7c3fd..1cf45bb0c924 100644
> >> --- a/sys/fs/nfsclient/nfs_clbio.c
> >> +++ b/sys/fs/nfsclient/nfs_clbio.c
> >> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct
> >> thread *td, struct ucred *cred)
> >>        return (error);
> >> }
> >> 
> >> +static bool
> >> +ncl_bioread_dora(struct vnode *vp)
> >> +{
> >> +       vm_object_t obj;
> >> +
> >> +       obj = vp->v_object;
> >> +       if (obj == NULL)
> >> +               return (true);
> >> +       return (!vm_object_mightbedirty(vp->v_object) &&
> >> +           vp->v_object->un_pager.vnp.writemappings == 0);
> >> +}
> >> +
> >> /*
> >>  * Vnode op for read using bio
> >>  */
> >> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
> >> ioflag, struct ucred *cred)
> >>                 * unlocked read by nfsiod could obliterate changes
> >>                 * done by userspace.
> >>                 */
> >> -               if (nmp->nm_readahead > 0 &&
> >> -                   !vm_object_mightbedirty(vp->v_object) &&
> >> -                   vp->v_object->un_pager.vnp.writemappings == 0) {
> >> +               if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) {
> >>                    for (nra = 0; nra < nmp->nm_readahead && nra < seqcount &&
> >>                        (off_t)(lbn + 1 + nra) * biosize < nsize; nra++) {
> >>                        rabn = lbn + 1 + nra;
> >> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
> >> ioflag, struct ucred *cred)
> >>                 *  directory offset cookie of the next block.)
> >>                 */
> >>                NFSLOCKNODE(np);
> >> -               if (nmp->nm_readahead > 0 &&
> >> -                   !vm_object_mightbedirty(vp->v_object) &&
> >> -                   vp->v_object->un_pager.vnp.writemappings == 0 &&
> >> +               if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) &&
> >>                    (bp->b_flags & B_INVAL) == 0 &&
> >>                    (np->n_direofoffset == 0 ||
> >>                    (lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) &&
> >> 
> >> rick
> >> ps: It appears that autofs causes the directory to be read before it
> >> is open'd for
> >>      some reason. I've never looked at autofs.
> >>   
> >>> 
> >>> Peter
> >>> 
> >>> On 15 Jan 2024, at 09:31, Peter Blok <pblok@bsd4all.org <mailto:pblok@bsd4all.org>>
> >>> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> I do have a crash on a NFS client with stable of today
> >>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe it is the
> >>> same problem.
> >>> 
> >>> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type tab to
> >>> autocomplete it crashes with the below stack trace. If I plainly mount ports on
> >>> /usr/ports and do the same everything works. I am using NFSv3
> >>> 
> >>> Peter
> >>> 
> >>> 
> >>> 
> >>> 
> >>> Fatal trap 12: page fault while in kernel mode
> >>> cpuid = 2; apic id = 04
> >>> fault virtual address = 0x89
> >>> fault code = supervisor read data, page not present
> >>> instruction pointer = 0x20:0xffffffff809645d4
> >>> stack pointer        = 0x28:0xfffffe00acadb830
> >>> frame pointer        = 0x28:0xfffffe00acadb830
> >>> code segment = base 0x0, limit 0xfffff, type 0x1b
> >>> = DPL 0, pres 1, long 1, def32 0, gran 1
> >>> processor eflags = interrupt enabled, resume, IOPL = 0
> >>> current process = 6869 (csh)
> >>> trap number = 12
> >>> panic: page fault
> >>> cpuid = 2
> >>> time = 1705306940
> >>> KDB: stack backtrace:
> >>> #0 0xffffffff806232f5 at kdb_backtrace+0x65
> >>> #1 0xffffffff805d7a02 at vpanic+0x152
> >>> #2 0xffffffff805d78a3 at panic+0x43
> >>> #3 0xffffffff809d58ad at trap_fatal+0x38d
> >>> #4 0xffffffff809d58ff at trap_pfault+0x4f
> >>> #5 0xffffffff809af048 at calltrap+0x8
> >>> #6 0xffffffff804c7a7e at ncl_bioread+0xb7e
> >>> #7 0xffffffff804b9d90 at nfs_readdir+0x1f0
> >>> #8 0xffffffff8069c61a at vop_sigdefer+0x2a
> >>> #9 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
> >>> #10 0xffffffff81ce75de at autofs_readdir+0x2ce
> >>> #11 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
> >>> #12 0xffffffff806c3002 at kern_getdirentries+0x222
> >>> #13 0xffffffff806c33a9 at sys_getdirentries+0x29
> >>> #14 0xffffffff809d6180 at amd64_syscall+0x110
> >>> #15 0xffffffff809af95b at fast_syscall_common+0xf8
> >>> 
> >>> 
> >>> 
> >>> On 15 Jan 2024, at 06:46, FreeBSD User <freebsd@walstatt-de.de
> >>> <mailto:freebsd@walstatt-de.de>> wrote:
> >>> 
> >>> Am Sun, 14 Jan 2024 20:34:12 -0800
> >>> Cy Schubert <Cy.Schubert@cschubert.com <mailto:Cy.Schubert@cschubert.com>> schrieb:
> >>> 
> >>> In message <CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c
> >>> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c>  
> >>> om>  
> >>> , Rick Macklem writes:
> >>> 
> >>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop <ronald-lists@klop.ws
> >>> <mailto:ronald-lists@klop.ws>>= wrote:
> >>> 
> >>> 
> >>> 
> >>> Van: FreeBSD User <freebsd@walstatt-de.de <mailto:freebsd@walstatt-de.de>>
> >>> Datum: 13 januari 2024 19:34
> >>> Aan: FreeBSD CURRENT <freebsd-current@freebsd.org <mailto:freebsd-current@freebsd.org>>
> >>> Onderwerp: NFSv4 crash of CURRENT
> >>> 
> >>> Hello,
> >>> 
> >>> running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a=
> >>> 
> >>> : Sat Jan 13 18:08:32
> >>> 
> >>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl=
> >>> 
> >>> ient, other is FreeBSD
> >>> 
> >>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
> >>> 
> >>> I can crash the client reproducable by accessing the one or other NFSv4 F=
> >>> 
> >>> S (a simple ls -la).
> >>> 
> >>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a=
> >>> 
> >>> ccess to the client
> >>> 
> >>> host, luckily the box recovers.
> >>> 
> >>> Did you rebuild both the nfscommon and nfscl modules from the same sources?
> >>> I did a commit to main that changes the interface between these two
> >>> modules and did bump the
> >>> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
> >>> (If you have "options NFSCL" in your kernel config, both should have
> >>> been rebuilt as a part of
> >>> the kernel build.)
> >>> 
> >>> 
> >>> Is anyone by chance seeing autofs in the backtrace too?
> >>> 
> >>> 
> >>> 
> >>> Hello Cy Shubert,
> >>> 
> >>> I forgot to mention that those crashes occur with autofs mounted filesystems. Good
> >>> question, by the way, I will check whether crashes also happen when mounting the
> >>> tradidional way.
> >>> 
> >>> Kind regards,
> >>> 
> >>> oh
> >>> 
> >>> --
> >>> O. Hartmann  
> >   
> 



-- 
O. Hartmann