[Bug 280978] Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations

Reply: bugzilla-noreply_a_freebsd.org: "[Bug 280978] Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 280978] Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 280978] Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 280978] Kernel panics with vfs.nfsd.enable_locallocks=1 and nfs clients doing hdf5 file operations"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 21 Aug 2024 15:00:46 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280978

            Bug ID: 280978
           Summary: Kernel panics with vfs.nfsd.enable_locallocks=1 and
                    nfs clients doing hdf5 file operations
           Product: Base System
           Version: 15.0-CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: matthew.l.dailey@dartmouth.edu

With vfs.nfsd.enable_locallocks=1, kernel panics or hung nfs server (more
rarely) can be induced from Linux nfs clients doing hdf5 file operations. In
testing, this has also sometimes resulted in irrecoverable zpool corruption due
to (I assume) memory corruption.

In testing on hardware and VMs, we have induced these failures usually within a
few hours, but sometimes within several days to a week. We have replicated this
on 13.0 through 15.0-CURRENT (20240725-82283cad12a4-271360). With this sysctl
set to 0 (default), we are unable to replicate the issue, even after several
weeks of 24/7 hdf5 file operations.

Below is a backtrace from a recent panic on a test VM. Based on a suggestion
from a colleague, we are currently running a test with a VM on 15.0-CURRENT
(20240725-82283cad12a4-271360) with a single CPU, just to see if this has any
effect.

Thanks and please let me know what other information I can provide.

#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0)
    at /usr/src/sys/kern/kern_shutdown.c:404
#2  0xffffffff8049e0fa in db_dump (dummy=<optimized out>,
    dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>)
    at /usr/src/sys/ddb/db_command.c:596
#3  0xffffffff8049deed in db_command (last_cmdp=<optimized out>,
    cmd_table=<optimized out>, dopager=true)
    at /usr/src/sys/ddb/db_command.c:508
#4  0xffffffff8049dbad in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:555
#5  0xffffffff804a15f6 in db_trap (type=<optimized out>, code=<optimized out>)
    at /usr/src/sys/ddb/db_main.c:267
#6  0xffffffff80b9c49f in kdb_trap (type=type@entry=3, code=code@entry=0,
    tf=tf@entry=0xfffffe00da2dfce0) at /usr/src/sys/kern/subr_kdb.c:790
#7  0xffffffff81068479 in trap (frame=0xfffffe00da2dfce0)
    at /usr/src/sys/amd64/amd64/trap.c:606
#8  <signal handler called>
#9  kdb_enter (why=<optimized out>, msg=<optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:556
#10 0xffffffff80b4cd40 in vpanic (fmt=0xffffffff811db2b0 "%s",
    ap=ap@entry=0xfffffe00da2dff10) at /usr/src/sys/kern/kern_shutdown.c:967
#11 0xffffffff80b4cbc3 in panic (
    fmt=0xffffffff81b98380 <cnputs_mtx> "N\235\024\201\377\377\377\377")
    at /usr/src/sys/kern/kern_shutdown.c:892
#12 0xffffffff81068edb in trap_fatal (frame=0xfffffe00da2e0010, eva=7)
    at /usr/src/sys/amd64/amd64/trap.c:950
#13 0xffffffff81068f80 in trap_pfault (frame=<optimized out>, usermode=false,
    signo=0x0, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:758
#14 <signal handler called>
#15 0xffffffff80a42cc7 in nfsrv_freelockowner (
    stp=stp@entry=0xfffff8000459a600, vp=vp@entry=0x0,
    cansleep=cansleep@entry=0, p=p@entry=0xfffff8002716d740)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdstate.c:1637
#16 0xffffffff80a45865 in nfsrv_freestateid (nd=nd@entry=0xfffffe00da2e0428,
    stateidp=stateidp@entry=0xfffffe00da2e0180, p=p@entry=0xfffff8002716d740)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdstate.c:6651
#17 0xffffffff80a558b1 in nfsrvd_freestateid (nd=0xfffffe00da2e0428,
    isdgram=<optimized out>, vp=<optimized out>, exp=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdserv.c:4775
#18 0xffffffff80a35ecf in nfsrvd_compound (nd=0xfffffe00da2e0428, isdgram=0,
    tag=<optimized out>, taglen=0, minorvers=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdsocket.c:1338
#19 nfsrvd_dorpc (nd=nd@entry=0xfffffe00da2e0428, isdgram=isdgram@entry=0,
    tag=<optimized out>, taglen=0, minorvers=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdsocket.c:633
#20 0xffffffff80a4b608 in nfs_proc (nd=0xfffffe00da2e0428,
    xid=<optimized out>, xprt=0xfffff8000a8a6400, rpp=<optimized out>)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:474
#21 nfssvc_program (rqst=0xfffff80027e51800, xprt=0xfffff8000a8a6400)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:358
#22 0xffffffff80e62dd0 in svc_executereq (rqstp=0xfffff80027e51800)
    at /usr/src/sys/rpc/svc.c:1053
#23 svc_run_internal (grp=grp@entry=0xfffff80003201100,
    ismaster=ismaster@entry=1) at /usr/src/sys/rpc/svc.c:1329
#24 0xffffffff80e6220f in svc_run (pool=0xfffff80003201000)
    at /usr/src/sys/rpc/svc.c:1408
#25 0xffffffff80a4bd73 in nfsrvd_nfsd (td=td@entry=0xfffff8002716d740,
    args=args@entry=0xfffffe00da2e09a0)
    at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:641
#26 0xffffffff80a66ada in nfssvc_nfsd (td=0xfffff8002716d740,
    uap=<optimized out>) at /usr/src/sys/fs/nfsserver/nfs_nfsdport.c:3877
#27 0xffffffff80dcc11c in sys_nfssvc (td=<optimized out>, uap=<optimized out>)
    at /usr/src/sys/nfs/nfs_nfssvc.c:107
#28 0xffffffff81069888 in syscallenter (td=0xfffff8002716d740)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#29 amd64_syscall (td=0xfffff8002716d740, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1192
#30 <signal handler called>
#31 0x00002687a11d1eda in ?? ()
Backtrace stopped: Cannot access memory at address 0x26879f30f8f8

-- 
You are receiving this mail because:
You are the assignee for the bug.