Re: ZFS + FreeBSD XEN dom0 panic
- In reply to: Ze Dupsys : "Re: ZFS + FreeBSD XEN dom0 panic"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 14 Apr 2022 07:49:27 UTC
On Thu, Apr 14, 2022 at 10:20:25AM +0300, Ze Dupsys wrote: > On 2022.04.05. 18:22, Roger Pau Monné wrote: > > I've pushed the changes to: > > > > http://xenbits.xen.org/gitweb/?p=people/royger/freebsd.git;a=shortlog;h=refs/heads/for-leak > > > > (This is on top of main branch). > > > > I'm also attaching the two patches on this email. > > > > Let me know if those make a difference to stabilize the system. > > > > I do not know should i start a new thread, but i have captured another > panic, new trace, this is on different machine, similar setup, RELEASE-13.0 > + 2 mentioned patches. > > I do not know how to reliably repeat it, nor the cause. But i have suspicion > that this happens when doing some of steps like: create new ZVOL, turn one > VM off, add new HDD/ZVOL path to VM in cfg file, start VM back up, inside > this VM do some HDD load on newly added HDD (install stuff, extract data, > etc.) + something of: shut all VMs down one by one, then do init 0 or 6, or > create new other VM. On this machine i can't experiment too much, no serial > output available either. So you haven't seen this panic with the 3rd patch applied? I guess that's also possible because when testing the 3rd patch you are using a HEAD kernel rather than stable/13, so the ZFS code might have changed. > > Fatal trap 12: page fault while in kernel mode > cpuid = 3; apic id = 06 > fault virtual address = 0x68 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff821dc99d > stack pointer = 0x28:0xfffffe00c6b497d0 > frame pointer = 0x28:0xfffffe00c6b49870 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (xbbd26 taskq) > trap number = 12 > panic: page fault > cpuid = 3 > time = 1649915274 > KDB: stack backtrace: > #0 0xffffffff80c57385 at kdb_backtrace+0x65 > #1 0xffffffff80c09d61 at vpanic+0x181 > #2 0xffffffff80c09bd3 at panic+0x43 > #3 0xffffffff8108b187 at trap+0xbc7 > #4 0xffffffff8108b1df at trap+0xc1f > #5 0xffffffff8108a83d at trap+0x27d > #6 0xffffffff81061818 at calltrap+0x8 > #7 0xffffffff821c035a at dmu_read+0x2a > #8 0xffffffff8218da3a at zvol_geom_bio_strategy+0x2aa > #9 0xffffffff80a7f074 at xbd_instance_create+0xa3d4 > #10 0xffffffff80a7b00a at xbd_instance_create+0x636a > #11 0xffffffff80c6b021 at taskqueue_run+0x2a1 > #12 0xffffffff80c6c33c at taskqueue_thread_loop+0xac > #13 0xffffffff80bc7c9e at fork_exit+0x7e > #14 0xffffffff8106289e at fork_trampoline+0xe > Uptime: 24m0s > (ada0:ahcich0:0:0:0): spin-down > (ada1:ahcich1:0:0:0): spin-down > (ada2:ahcich2:0:0:0): spin-down > Dumping 2922 out of 6104 > > > > cat panic.log| sed -Ee 's/^#[0-9]* //' -e 's/ .*//' | xargs addr2line -e > /usr/lib/debug/boot/kernel/kernel.debug > /usr/src/sys/kern/subr_bus.c:2410 > /usr/src/sys/kern/kern_racct.c:632 > /usr/src/sys/kern/kern_racct.c:617 > /usr/src/sys/dev/isci/isci_sysctl.c:92 > /usr/src/sys/dev/isci/isci_sysctl.c:0 > /usr/src/sys/dev/isci/isci_oem_parameters.c:130 > /usr/src/sys/dev/hyperv/input/hv_kbd.c:540 > ??:0 > ??:0 > /usr/src/sys/dev/xen/blkback/blkback.c:3083 > /usr/src/sys/xen/xenbus/xenbusvar.h:96 > /usr/src/sys/kern/subr_kobj.c:145 > /usr/src/sys/kern/subr_module.c:255 > /usr/src/sys/kern/kern_event.c:0 > /usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1158 > > > Full output of (kgdb) backtrace > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > #1 doadump (textdump=<optimized out>) at > /usr/src/sys/kern/kern_shutdown.c:399 > #2 0xffffffff80c09956 in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:486 > #3 0xffffffff80c09dd0 in vpanic (fmt=<optimized out>, ap=<optimized out>) > at /usr/src/sys/kern/kern_shutdown.c:919 > #4 0xffffffff80c09bd3 in panic (fmt=<unavailable>) at > /usr/src/sys/kern/kern_shutdown.c:843 > #5 0xffffffff8108b187 in trap_fatal (frame=0xfffffe00c6b49710, eva=104) at > /usr/src/sys/amd64/amd64/trap.c:915 > #6 0xffffffff8108b1df in trap_pfault (frame=frame@entry=0xfffffe00c6b49710, > usermode=false, signo=<optimized out>, signo@entry=0x0, ucode=<optimized > out>, ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:732 > #7 0xffffffff8108a83d in trap (frame=0xfffffe00c6b49710) at > /usr/src/sys/amd64/amd64/trap.c:398 > #8 <signal handler called> > #9 0xffffffff821dc99d in dbuf_write_children_ready (zio=<optimized out>, > buf=<optimized out>, vdb=0x0) at > /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:4642 If this trace is correct the error is from passing vdb == NULL to dbuf_write_children_ready(): https://cgit.freebsd.org/src/tree/sys/contrib/openzfs/module/zfs/dbuf.c?h=stable/13#n4551 The function will unconditionally dereference (v)db, so passing NULL will trigger a page fault. I have no idea however how can you get to this state. Might be worth posting the trace to freebsd-fs@freebsd.org in order to get some feedback from the ZFS people. It's possible the issue is with blkback, but I would benefit from some help about what's wrong with the data I'm providing to d_strategy. Please Cc me on the email if you send to freebsd-fs@. Thanks, Roger.