Re: NFSv4 crash of CURRENT

From: FreeBSD User <freebsd_at_walstatt-de.de>
Date: Mon, 15 Jan 2024 16:35:44 UTC
Am Mon, 15 Jan 2024 11:53:31 +0100
Peter Blok <pblok@bsd4all.org> schrieb:

> Hi,
> 
> Forgot to mention I’m on 13-stable. The fix that is causing the crash with automounted NFS
> is:
> 
> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
> Author: Konstantin Belousov <kib@FreeBSD.org>
> Date:   Tue Jan 2 00:22:44 2024 +0200
> 
>     nfsclient: limit situations when we do unlocked read-ahead by nfsiod
>     
>     (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
> 
> When I remove the fix, the problem is gone. Add it back and the crash happens.
> 
> Peter
> 
> > On 15 Jan 2024, at 09:31, Peter Blok <pblok@bsd4all.org> wrote:
> > 
> > Hi,
> > 
> > I do have a crash on a NFS client with stable of today
> > (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe it is the
> > same problem.
> > 
> > I have ports automounted on /am/ports. When I do cd /am/ports/sys and type tab to
> > autocomplete it crashes with the below stack trace. If I plainly mount ports on /usr/ports
> > and do the same everything works. I am using NFSv3
> > 
> > Peter
> > 
> > 
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 2; apic id = 04
> > fault virtual address	= 0x89
> > fault code		= supervisor read data, page not present
> > instruction pointer	= 0x20:0xffffffff809645d4
> > stack pointer	        = 0x28:0xfffffe00acadb830
> > frame pointer	        = 0x28:0xfffffe00acadb830
> > code segment		= base 0x0, limit 0xfffff, type 0x1b
> > 			= DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags	= interrupt enabled, resume, IOPL = 0
> > current process		= 6869 (csh)
> > trap number		= 12
> > panic: page fault
> > cpuid = 2
> > time = 1705306940
> > KDB: stack backtrace:
> > #0 0xffffffff806232f5 at kdb_backtrace+0x65
> > #1 0xffffffff805d7a02 at vpanic+0x152
> > #2 0xffffffff805d78a3 at panic+0x43
> > #3 0xffffffff809d58ad at trap_fatal+0x38d
> > #4 0xffffffff809d58ff at trap_pfault+0x4f
> > #5 0xffffffff809af048 at calltrap+0x8
> > #6 0xffffffff804c7a7e at ncl_bioread+0xb7e
> > #7 0xffffffff804b9d90 at nfs_readdir+0x1f0
> > #8 0xffffffff8069c61a at vop_sigdefer+0x2a
> > #9 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
> > #10 0xffffffff81ce75de at autofs_readdir+0x2ce
> > #11 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
> > #12 0xffffffff806c3002 at kern_getdirentries+0x222
> > #13 0xffffffff806c33a9 at sys_getdirentries+0x29
> > #14 0xffffffff809d6180 at amd64_syscall+0x110
> > #15 0xffffffff809af95b at fast_syscall_common+0xf8
> > 
> > 
> >   
> >> On 15 Jan 2024, at 06:46, FreeBSD User <freebsd@walstatt-de.de
> >> <mailto:freebsd@walstatt-de.de>> wrote:
> >> 
> >> Am Sun, 14 Jan 2024 20:34:12 -0800
> >> Cy Schubert <Cy.Schubert@cschubert.com <mailto:Cy.Schubert@cschubert.com>> schrieb:
> >>   
> >>> In message <CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c
> >>> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c>  
> >>> om>    
> >>> , Rick Macklem writes:  
> >>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop <ronald-lists@klop.ws
> >>>> <mailto:ronald-lists@klop.ws>>= wrote:    
> >>>>> 
> >>>>> 
> >>>>> Van: FreeBSD User <freebsd@walstatt-de.de <mailto:freebsd@walstatt-de.de>>
> >>>>> Datum: 13 januari 2024 19:34
> >>>>> Aan: FreeBSD CURRENT <freebsd-current@freebsd.org <mailto:freebsd-current@freebsd.org>>
> >>>>> Onderwerp: NFSv4 crash of CURRENT
> >>>>> 
> >>>>> Hello,
> >>>>> 
> >>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a=    
> >>>> : Sat Jan 13 18:08:32    
> >>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl=    
> >>>> ient, other is FreeBSD    
> >>>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
> >>>>> 
> >>>>> I can crash the client reproducable by accessing the one or other NFSv4 F=    
> >>>> S (a simple ls -la).    
> >>>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a=    
> >>>> ccess to the client    
> >>>>> host, luckily the box recovers.    
> >>>> Did you rebuild both the nfscommon and nfscl modules from the same sources?
> >>>> I did a commit to main that changes the interface between these two
> >>>> modules and did bump the
> >>>> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
> >>>> (If you have "options NFSCL" in your kernel config, both should have
> >>>> been rebuilt as a part of
> >>>> the kernel build.)
> >>>>   
> >>> 
> >>> Is anyone by chance seeing autofs in the backtrace too?
> >>> 
> >>>   
> >> 
> >> Hello Cy Shubert,
> >> 
> >> I forgot to mention that those crashes occur with autofs mounted filesystems. Good
> >> question, by the way, I will check whether crashes also happen when mounting the
> >> tradidional way.
> >> 
> >> Kind regards,
> >> 
> >> oh
> >> 
> >> -- 
> >> O. Hartmann  
> >   
> 

good catch!

-- 
O. Hartmann