Re: panic: nfsv4root ref cnt cpuid = 1

From: J David <j.david.lists_at_gmail.com>
Date: Mon, 23 Sep 2024 21:14:42 UTC
We've had some other kernel panics that may be related to this. At the
very least, the call stack is the same up to nfsrpc_lookup+0x87f.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x28
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff809da260
stack pointer         = 0x28:0xfffffe0111f18438
frame pointer         = 0x28:0xfffffe0111f18470
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 14676 (sh)
rdi: 0000000000000028 rsi: fffff80071cee200 rdx: 0000000000000000
rcx: 0000000000000000  r8: 0000000000000032  r9: fffffe0111f19000
rax: 0000000000000000 rbx: fffff80161e4b000 rbp: fffffe0111f18470
r10: 00000000000001f4 r11: fffff8024e1e5760 r12: 0000000000000000
r13: 0000000000000000 r14: fffff80071cee200 r15: 0000000000000000
trap number = 12
panic: page fault
cpuid = 0
time = 1727064012
KDB: stack backtrace:
#0 0xffffffff80b7fefd at kdb_backtrace+0x5d
#1 0xffffffff80b32bd1 at vpanic+0x131
#2 0xffffffff80b32a93 at panic+0x43
#3 0xffffffff8100091b at trap_fatal+0x40b
#4 0xffffffff81000966 at trap_pfault+0x46
#5 0xffffffff80fd6d48 at calltrap+0x8
#6 0xffffffff809f9eef at nfsrpc_lookup+0x87f
#7 0xffffffff80a0e2fd at nfs_lookup+0x43d
#8 0xffffffff80c0341a at vop_sigdefer+0x2a
#9 0xffffffff8302c3a7 at null_lookup+0xc7
#10 0xffffffff80c08745 at vfs_lookup+0x425
#11 0xffffffff80c079b8 at namei+0x238
#12 0xffffffff80c2d2da at vn_open_cred+0x53a
#13 0xffffffff80c239a8 at openatfp+0x268
#14 0xffffffff80c236b8 at sys_open+0x28
#15 0xffffffff810011c0 at amd64_syscall+0x100
#16 0xffffffff80fd765b at fast_syscall_common+0xf8

I don't know if this gives any more insight or confirmation to your
theory about the problem, but it seems worth sharing.

We got three of these panics (and two of the first kind) on different
machines in the past 24 hours, so I'll definitely at least be
experimenting with bulk NFS mounts instead of one NFS + nullfs bulk
mounts in addition to trying the patch.

Thanks!