Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic

From: Michael Butler <imb_at_protected-networks.net>
Date: Mon, 12 Feb 2024 17:45:27 UTC
On 2/12/24 12:36, John Baldwin wrote:
> On 2/10/24 2:09 PM, Michael Butler wrote:
>> I have stability problems with anything at or after this commit
>> (b377ff8) on an amd64 laptop. While I see the following panic logged, no
>> crash dump is preserved :-( It happens after ~5-6 minutes running in KDE
>> (X).
>>
>> Reverting to 36efc64 seems to work reliably (after ACPI changes but
>> before the problematic PCI one)
>>
>> kernel: Fatal trap 12: page fault while in kernel mode
>> kernel: cpuid = 2; apic id = 02
>> kernel: fault virtual address     = 0x48
>> kernel: fault code                = supervisor read data, page not 
>> present
>> kernel: instruction pointer       = 0x20:0xffffffff80acb962
>> kernel: stack pointer             = 0x28:0xfffffe00c4318d80
>> kernel: frame pointer             = 0x28:0xfffffe00c4318d80
>> kernel: code segment              = base 0x0, limit 0xfffff, type 0x1b
>> kernel:                   = DPL 0, pres 1, long 1, def32 0, gran 1
>> kernel: processor eflags  = interrupt enabled, resume, IOPL = 0
>> kernel: current process           = 2 (clock (0))
>> kernel: rdi: fffff802e460c000 rsi: 0000000000000000 rdx: 0000000000000002
>> kernel: rcx: 0000000000000000  r8: 000000000000001e  r9: fffffe00c4319000
>> kernel: rax: 0000000000000002 rbx: fffff802e460c000 rbp: fffffe00c4318d80
>> kernel: r10: 0000000000001388 r11: 000000007ffc765d r12: 000f000000000000
>> kernel: r13: 0002000000000000 r14: fffff8000193e740 r15: 0000000000000000
>> kernel: trap number               = 12
>> kernel: panic: page fault
>> kernel: cpuid = 2
>> kernel: time = 1707573802
>> kernel: Uptime: 6m19s
>> kernel: Dumping 942 out of 16242
>> MB:..2%..11%..21%..31%..41%..51%..62%..72%..82%..92%
>> kernel: Dump complete
>> kernel: Automatic reboot in 15 seconds - press a key on the console to 
>> abort
> 
> Without a stack trace it is pretty much impossible to debug a panic like 
> this.
> Do you have KDB_TRACE enabled in your kernel config?  I'm also not sure 
> how the
> PCI changes can result in a panic post-boot.  If you were going to have 
> problems
> they would be during device attach, not after you are booted and running X.
> 
> Short of a stack trace, you can at least use lldb or gdb to lookup the 
> source
> line associated with the faulting instruction pointer (as long as it 
> isn't in
> a kernel module), e.g. for gdb you would use 'gdb /boot/kernel/kernel' 
> and then
> 'l *<instruction pointer address>', e.g. from above: 'l 
> *0xffffffff80acb962'

I suspect the absence of a core dump was due to my use of tmpfs for /tmp 
and /var/tmp while still having clear_tmp enabled in rc.conf (that may 
touch swap on restart).

Since then, I've removed tmpfs, everything under /usr/obj am rebuilding 
from scratch. I'll update when it finally finishes (i5-3340s are quick :-()

	Michael