[Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 20 Mar 2023 23:03:19 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028 --- Comment #140 from Mark Millard <marklmi26-fbsd@yahoo.com> --- (In reply to George Mitchell from comment #137) All 4 are examples related to dbuf_evict_thread (a.k.a. zfs dbuf related crashes), as I feared. All 4 look like: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x7 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff82600ba6 Looks to be in: 5 1 0xffffffff82600000 3df128 zfs.ko panic: page fault cpuid = 1 time = 1679349400 KDB: stack backtrace: #0 0xffffffff80c66ee5 at kdb_backtrace+0x65 #1 0xffffffff80c1bbef at vpanic+0x17f #2 0xffffffff80c1ba63 at panic+0x43 #3 0xffffffff810addf5 at trap_fatal+0x385 #4 0xffffffff810ade4f at trap_pfault+0x4f #5 0xffffffff81084fd8 at calltrap+0x8 #6 0xffffffff827ac768 at zap_evict_sync+0x68 #7 0xffffffff8267d74a at dbuf_destroy+0xba #8 0xffffffff82683129 at dbuf_evict_one+0xf9 #9 0xffffffff8267b43d at dbuf_evict_thread+0x31d #10 0xffffffff80bd8abe at fork_exit+0x7e #11 0xffffffff8108604e at fork_trampoline+0xe #6 0xffffffff810ade4f in trap_pfault (frame=0xfffffe00b3bb6d00, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #7 <signal handler called> #8 avl_destroy_nodes (tree=tree@entry=0xfffff8001a80b5a0, cookie=cookie@entry=0xfffffe00b3bb6dd0) at /usr/src/sys/contrib/openzfs/module/avl/avl.c:1023 #9 0xffffffff827ac768 in mze_destroy (zap=0xfffff8001a80b480) at /usr/src/sys/contrib/openzfs/module/zfs/zap_micro.c:402 A question would be if this repeats based on amdgpu having been loaded (again last) but no X11 like activity having ever been started: limiting amdgpu use to just the load activity or as close to that limited of use as is possible. (This is separate from your zfs load time adjustment test.) My guess is that the content of some memory area(s) is being trashed in your context. I'm not sure how to track down what is doing the trashing or were all the trashed area(s) are if that is what is going on. At least we now have a clue how to get the specific type of crash. Before I had no clue what an example initial-context might be like. Note: Changing the load order should get a matching kldstat report to indicate the address ranges that end up involved. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.