[Bug 274346] kernel panic/page fault in nfs_commonkrpc.c::newnfs_request(), due to duplicate hostid's
Date: Sun, 08 Oct 2023 02:27:19 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274346 Bug ID: 274346 Summary: kernel panic/page fault in nfs_commonkrpc.c::newnfs_request(), due to duplicate hostid's Product: Base System Version: 14.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: freebsd@kumba.dev So I have managed to trigger a kernel panic in 14.0-BETA5 in the NFS subsystem, but this is partly due to a mistake I myself made by not changing the hostid of a system that's a clone of another active system. The first system is running 13.2-RELEASE-p4 and the cloned system is running 14.0-BETA5, newly upgraded. There are several elements that lead to this panic: - Both systems have the same hostid - Both systems mount the same remote NFS share from a third system - Have the 13.2-RELEASE-p4 system start doing a job on the remote share, like compiling code (e.g., /usr/ports is on this share) - Have the cloned system running 14.0-BETA5 attempt to unmount the remote share - The 14.0-BETA5 system will crash I know it's due to duplicate hostid's, because the below message is printed on the console immediately before the kernel crashes: > > Initiate recovery. If server has not rebooted, check NFS clients for unique /etc/hostid's > And the printf() for that exact string is in the crashing function right where GDB says the crash happens, in nfs_commonkrpc.c, function newnfs_request(), line 1212. I'm just not sure if it's the if statement immediately preceeding the printf() call or the if statement that happens after. The next call is memcmp() in machine code, so I am assuming a NULL deref of some kind. My kernel is a custom build, but this can be triggered on a GENERIC kernel as well, as my first crash happened on GENERIC right before I was set to reboot into my rebuilt custom kernel after doing the second `freebsd-update install` phase to upgrade to 14.0-BETA5. At that time, I had crashdumps disabled. So the below crash info is from that custom kernel, after I enabled crashdumps and re-triggered the crash (it's at least reproducible...): > Unread portion of the kernel message buffer: > [179] > [179] > [179] Fatal trap 12: page fault while in kernel mode > [179] cpuid = 0; apic id = 00 > [179] fault virtual address = 0x4 > [179] fault code = supervisor read data, page not present > [179] instruction pointer = 0x20:0xffffffff809e9893 > [179] stack pointer = 0x28:0xfffffe00a233e800 > [179] frame pointer = 0x28:0xfffffe00a233e800 > [179] code segment = base 0x0, limit 0xfffff, type 0x1b > [179] = DPL 0, pres 1, long 1, def32 0, gran 1 > [179] processor eflags = interrupt enabled, resume, IOPL = 0 > [179] current process = 87256 (umount) > [179] rdi: fffff800077761e4 rsi: 0000000000000004 rdx: 0000000000000010 > [179] rcx: 0000000000000000 r8: 0000000000000024 r9: fffffe00a233f000 > [179] rax: 0000000000000000 rbx: fffffe00a251b020 rbp: fffffe00a233e800 > [179] r10: 0000000000000585 r11: 000000007ff9687f r12: fffff80007776010 > [180] r13: fffff80003abb800 r14: fffffe00a233ea18 r15: fffff80007776000 > [180] trap number = 12 > [180] panic: page fault > [180] cpuid = 0 > [180] time = 1696723338 > [180] KDB: stack backtrace: > [180] #0 0xffffffff806b5edd at kdb_backtrace+0x5d > [180] #1 0xffffffff8066aa20 at vpanic+0x130 > [180] #2 0xffffffff8066a8e3 at panic+0x43 > [180] #3 0xffffffff809ee34c at trap_fatal+0x40c > [180] #4 0xffffffff809ee39e at trap_pfault+0x4e > [180] #5 0xffffffff809c6288 at calltrap+0x8 > [180] #6 0xffffffff8053f804 at newnfs_request+0x10a4 > [180] #7 0xffffffff8054dbad at nfsrpc_destroysession+0x11d > [180] #8 0xffffffff80557252 at nfscl_umount+0x312 > [180] #9 0xffffffff80589470 at nfs_unmount+0x70 > [180] #10 0xffffffff8073c4ad at vfs_unmount_sigdefer+0x2d > [180] #11 0xffffffff80741e37 at dounmount+0x787 > [180] #12 0xffffffff80741645 at kern_unmount+0x2f5 > [180] #13 0xffffffff809eeaf9 at amd64_syscall+0x109 > [180] #14 0xffffffff809c6b9b at fast_syscall_common+0xf8 > [180] Timeout initializing vt_vga > [180] Uptime: 3m0s > [180] Dumping 447 out of 8077 MB:..4%..11%..22%..33%..43%..51%..61%..72%..83%..93% > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 > 57 /usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory. > (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 > #1 doadump (textdump=<optimized out>) at ../../../kern/kern_shutdown.c:405 > #2 0xffffffff8066a5b7 in kern_reboot (howto=260) > at ../../../kern/kern_shutdown.c:526 > #3 0xffffffff8066aa8d in vpanic (fmt=0xffffffff80a3bcd1 "%s", > ap=ap@entry=0xfffffe00a233e680) at ../../../kern/kern_shutdown.c:970 > #4 0xffffffff8066a8e3 in panic (fmt=<unavailable>) > at ../../../kern/kern_shutdown.c:894 > #5 0xffffffff809ee34c in trap_fatal (frame=0xfffffe00a233e740, eva=4) > at ../../../amd64/amd64/trap.c:952 > #6 0xffffffff809ee39e in trap_pfault (frame=0xfffffe00a233e740, > usermode=false, signo=<optimized out>, ucode=<optimized out>) > at ../../../amd64/amd64/trap.c:760 > #7 <signal handler called> > #8 memcmp () at ../../../amd64/amd64/support.S:115 > #9 0xffffffff8053f804 in newnfs_request (nd=nd@entry=0xfffffe00a233ea18, > nmp=nmp@entry=0xfffff80003abb800, clp=clp@entry=0x0, > nrp=nrp@entry=0xfffff80003abbcd8, vp=vp@entry=0x0, > td=td@entry=0xfffffe00a251b020, cred=0xfffff8000765aa00, prog=100003, > vers=4, retsum=0x0, toplevel=1, xidp=0x0, dssep=0x0) > at ../../../fs/nfs/nfs_commonkrpc.c:1212 > #10 0xffffffff8054dbad in nfsrpc_destroysession ( > nmp=nmp@entry=0xfffff80003abb800, tsep=0xfffff80007776010, > tsep@entry=0x0, cred=cred@entry=0xfffff8000765aa00, > p=p@entry=0xfffffe00a251b020) at ../../../fs/nfs/nfs_commonsubs.c:5151 > #11 0xffffffff80557252 in nfscl_umount (nmp=nmp@entry=0xfffff80003abb800, > p=p@entry=0xfffffe00a251b020, dhp=dhp@entry=0x0) > at ../../../fs/nfsclient/nfs_clstate.c:2094 > #12 0xffffffff80589470 in nfs_unmount (mp=0xfffffe00a4058000, > mntflags=<optimized out>) at ../../../fs/nfsclient/nfs_clvfsops.c:1903 > #13 0xffffffff8073c4ad in vfs_unmount_sigdefer (mp=0xfffffe00a4058000, > mntflags=134217728) at ../../../kern/vfs_init.c:185 > #14 0xffffffff80741e37 in dounmount (mp=0xfffff800077761e4, > mp@entry=0xfffffe00a4058000, flags=flags@entry=134217728, > td=td@entry=0xfffffe00a251b020) at ../../../kern/vfs_mount.c:2327 > #15 0xffffffff80741645 in kern_unmount (td=0xfffffe00a251b020, > path=<optimized out>, flags=134217728) at ../../../kern/vfs_mount.c:1785 > #16 0xffffffff809eeaf9 in syscallenter (td=0xfffffe00a251b020) > at ../../../amd64/amd64/../../kern/subr_syscall.c:187 > #17 amd64_syscall (td=0xfffffe00a251b020, traced=0) > at ../../../amd64/amd64/trap.c:1197 > #18 <signal handler called> > #19 0x0000244bc41489ba in ?? () > Backtrace stopped: Cannot access memory at address 0x244bc20f4c18 > (kgdb) -- You are receiving this mail because: You are the assignee for the bug.