Re: FreeBSD panics possibly caused by nfs clients

Reply: Rick Macklem : "Re: FreeBSD panics possibly caused by nfs clients"
In reply to: Rick Macklem : "Re: FreeBSD panics possibly caused by nfs clients"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: Matthew L. Dailey <Matthew.L.Dailey_at_dartmouth.edu>
Date: Tue, 20 Feb 2024 19:21:01 UTC

Hi all,

I induced a panic on my CURRENT (20240215-d79b6b8ec267-268300) VM after 
about 24 hours. This is the one without any debugging, so it only 
confirms the fact that the panics we've been experiencing still exist in 
CURRENT. There was some disk issue that prevented the dump, so all I 
have is the panic, pasted below.

The two test systems with full debugging are still running after a week 
and a half.

> You might want to set
> kern.kstack_pages=6
> in /boot/loader.conf in these setups.
> 
> I would normally expect double faults when a kernel stack is blown,
> but maybe there is a reason that you do now see that for a blown kernel
> stack. (The impact of increasing stack pages from 4->6 should be minimal.)
> 
> rick
Rick - I'm a little confused by the kstack_pages tunable and just want 
to clarify. Are you proposing that this might solve the panic issues 
we've been having, or that it will make the panics/dumps more useful by 
avoiding false positives? We've only ever seen that "double fault" once 
in over 100 observed panics, and that was only when we enabled just 
KASAN on a 14.0p4 system.

-Matt


[85751] Fatal trap 12: page fault while in kernel mode
[85751] cpuid = 3; apic id = 06
[85751] fault virtual address      = 0x4f0f760
[85751] fault code         = supervisor read data, page not present
[85751] instruction pointer        = 0x20:0xffffffff820022f7
[85751] stack pointer              = 0x28:0xfffffe010bdf8d50
[85751] frame pointer              = 0x28:0xfffffe010bdf8d80
[85751] code segment               = base 0x0, limit 0xfffff, type 0x1b
[85751]                    = DPL 0, pres 1, long 1, def32 0, gran 1
[85751] processor eflags   = interrupt enabled, resume, IOPL = 0
[85751] current process            = 0 (z_wr_int_h_3)
[85751] rdi: fffff802d1036900 rsi: fffff80416887300 rdx: fffff80416887380
[85751] rcx: fffff802d1036908  r8: 0000000000000100  r9: 8013070f000700ff
[85751] rax: 0000000004f0f748 rbx: fffff802d1036900 rbp: fffffe010bdf8d80
[85751] r10: fffff80412c4f708 r11: 0000000000000000 r12: fffff8000944ed58
[85751] r13: 0000000000000000 r14: 0000000004f0f748 r15: fffffe010caa9438
[85751] trap number                = 12
[85751] panic: page fault
[85751] cpuid = 3
[85751] time = 1708451091
[85751] KDB: stack backtrace:
[85751] #0 0xffffffff80b9803d at kdb_backtrace+0x5d
[85751] #1 0xffffffff80b4a8d5 at vpanic+0x135
[85751] #2 0xffffffff80b4a793 at panic+0x43
[85751] #3 0xffffffff81026b8f at trap_fatal+0x40f
[85751] #4 0xffffffff81026bdf at trap_pfault+0x4f
[85751] #5 0xffffffff80ffd9f8 at calltrap+0x8
[85751] #6 0xffffffff81fea83b at dmu_sync_late_arrival_done+0x6b
[85751] #7 0xffffffff8214a78e at zio_done+0xc6e
[85751] #8 0xffffffff821442cc at zio_execute+0x3c
[85751] #9 0xffffffff80bae402 at taskqueue_run_locked+0x182
[85751] #10 0xffffffff80baf692 at taskqueue_thread_loop+0xc2
[85751] #11 0xffffffff80b0484f at fork_exit+0x7f
[85751] #12 0xffffffff80ffea5e at fork_trampoline+0xe
[85751] Uptime: 23h49m11s