Re: ZFS + FreeBSD XEN dom0 panic
- In reply to: Ze Dupsys : "Re: ZFS + FreeBSD XEN dom0 panic"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 28 Mar 2022 13:28:25 UTC
On Sun, Mar 27, 2022 at 08:42:23PM +0300, Ze Dupsys wrote: > On Sun, Mar 27, 2022 at 12:13 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > On Sun, Mar 27, 2022 at 12:38:00AM +0200, Ze Dupsys wrote: > > > ==== COUNT: 2 > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 1; apic id = 02 > > > fault virtual address = 0x68 > > > fault code = supervisor read data, page not present > > > instruction pointer = 0x20:0xffffffff824a599d > > > stack pointer = 0x28:0xfffffe00b1e27910 > > > frame pointer = 0x28:0xfffffe00b1e279b0 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags = interrupt enabled, resume, IOPL = 0 > > > current process = 0 (xbbd7 taskq) > > > trap number = 12 > > > panic: page fault > > > cpuid = 1 > > > time = 1646122723 > > > KDB: stack backtrace: > > > #0 0xffffffff80c57525 at kdb_backtrace+0x65 > > > #1 0xffffffff80c09f01 at vpanic+0x181 > > > #2 0xffffffff80c09d73 at panic+0x43 > > > #3 0xffffffff8108b1a7 at trap+0xbc7 > > > #4 0xffffffff8108b1ff at trap+0xc1f > > > #5 0xffffffff8108a85d at trap+0x27d > > > #6 0xffffffff81061b18 at calltrap+0x8 > > > #7 0xffffffff8248935a at dmu_read+0x2a > > > #8 0xffffffff82456a3a at zvol_geom_bio_strategy+0x2aa > > > #9 0xffffffff80a7f214 at xbd_instance_create+0xa394 > > > #10 0xffffffff80a7b1ea at xbd_instance_create+0x636a > > > #11 0xffffffff80c6b1c1 at taskqueue_run+0x2a1 > > > #12 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac > > > #13 0xffffffff80bc7e3e at fork_exit+0x7e > > > #14 0xffffffff81062b9e at fork_trampoline+0xe > > > > > > > > > ==== COUNT: 1 > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 1; apic id = 02 > > > fault virtual address = 0x148 > > > fault code = supervisor read data, page not present > > > instruction pointer = 0x20:0xffffffff8248cef4 > > > stack pointer = 0x28:0xfffffe009941d9a0 > > > frame pointer = 0x28:0xfffffe009941d9a0 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags = interrupt enabled, resume, IOPL = 0 > > > current process = 0 (xbbd1 taskq) > > > trap number = 12 > > > panic: page fault > > > cpuid = 1 > > > time = 1646129773 > > > KDB: stack backtrace: > > > #0 0xffffffff80c57525 at kdb_backtrace+0x65 > > > #1 0xffffffff80c09f01 at vpanic+0x181 > > > #2 0xffffffff80c09d73 at panic+0x43 > > > #3 0xffffffff8108b1a7 at trap+0xbc7 > > > #4 0xffffffff8108b1ff at trap+0xc1f > > > #5 0xffffffff8108a85d at trap+0x27d > > > #6 0xffffffff81061b18 at calltrap+0x8 > > > #7 0xffffffff825cb76e at zil_open+0xe > > > #8 0xffffffff82456d02 at zvol_ensure_zilog+0xb2 > > > #9 0xffffffff82456818 at zvol_geom_bio_strategy+0x88 > > > #10 0xffffffff80a7f214 at xbd_instance_create+0xa394 > > > #11 0xffffffff80a7b1ea at xbd_instance_create+0x636a > > > #12 0xffffffff80c6b1c1 at taskqueue_run+0x2a1 > > > #13 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac > > > #14 0xffffffff80bc7e3e at fork_exit+0x7e > > > #15 0xffffffff81062b9e at fork_trampoline+0xe > > > > Hm, those last ones are in ZFS code, can you try to get the line > > numbers for those? > > > > Maybe it's blkback providing bad data to the disk open functions. > > > > Since you are doing so much testing, it might make sense for you to > > use a debug FreeBSD kernel rather than a production one (one with > > WITNESS and INVARIANTS enabled). > > > > Thanks, Roger. > > > > ---8<--- > > diff --git a/sys/dev/xen/blkback/blkback.c b/sys/dev/xen/blkback/blkback.c > > index 33414295bf5e..4007a93a54c7 100644 > > --- a/sys/dev/xen/blkback/blkback.c > > +++ b/sys/dev/xen/blkback/blkback.c > > @@ -2774,19 +2774,12 @@ xbb_free_communication_mem(struct xbb_softc *xbb) > > static int > > xbb_disconnect(struct xbb_softc *xbb) > > { > > - struct gnttab_unmap_grant_ref ops[XBB_MAX_RING_PAGES]; > > - struct gnttab_unmap_grant_ref *op; > > - u_int ring_idx; > > - int error; > > - > > DPRINTF("\n"); > > > > - if ((xbb->flags & XBBF_RING_CONNECTED) == 0) > > - return (0); > > - > > mtx_unlock(&xbb->lock); > > xen_intr_unbind(&xbb->xen_intr_handle); > > - taskqueue_drain(xbb->io_taskqueue, &xbb->io_task); > > + if (xbb->io_taskqueue != NULL) > > + taskqueue_drain(xbb->io_taskqueue, &xbb->io_task); > > mtx_lock(&xbb->lock); > > > > /* > > @@ -2796,19 +2789,28 @@ xbb_disconnect(struct xbb_softc *xbb) > > if (xbb->active_request_count != 0) > > return (EAGAIN); > > > > - for (ring_idx = 0, op = ops; > > - ring_idx < xbb->ring_config.ring_pages; > > - ring_idx++, op++) { > > - op->host_addr = xbb->ring_config.gnt_addr > > - + (ring_idx * PAGE_SIZE); > > - op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx]; > > - op->handle = xbb->ring_config.handle[ring_idx]; > > - } > > + if (xbb->flags & XBBF_RING_CONNECTED) { > > + struct gnttab_unmap_grant_ref ops[XBB_MAX_RING_PAGES]; > > + struct gnttab_unmap_grant_ref *op; > > + unsigned int ring_idx; > > + int error; > > + > > + for (ring_idx = 0, op = ops; > > + ring_idx < xbb->ring_config.ring_pages; > > + ring_idx++, op++) { > > + op->host_addr = xbb->ring_config.gnt_addr > > + + (ring_idx * PAGE_SIZE); > > + op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx]; > > + op->handle = xbb->ring_config.handle[ring_idx]; > > + } > > > > - error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops, > > - xbb->ring_config.ring_pages); > > - if (error != 0) > > - panic("Grant table op failed (%d)", error); > > + error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops, > > + xbb->ring_config.ring_pages); > > + if (error != 0) > > + panic("Grant table op failed (%d)", error); > > + > > + xbb->flags &= ~XBBF_RING_CONNECTED; > > + } > > > > xbb_free_communication_mem(xbb); > > > > @@ -2839,7 +2841,6 @@ xbb_disconnect(struct xbb_softc *xbb) > > xbb->request_lists = NULL; > > } > > > > - xbb->flags &= ~XBBF_RING_CONNECTED; > > return (0); > > } > > Hello, > > I applied given patch, i did not have enough time to test thoroughly, > but for 3 hours system was running without panic whereas previously it > would crash in around 1,5 hours in similar settings. Till Thursday i > will not be able to test. Well, I guess that's good. Let see if you can trigger those ZFS related issues again and can provide the file:line numbers for the backtrace. > About those ZFS panic traces, i will try to get line numbers, but the > problem is that i do not have /usr/lib/debug/boot/kernel/kernel.debug > for FreeBSD 13.0-RELEASE-p7. I tried on laptop's VirtualBox to set up > 13.0-RELEASE, but freebsd-update now updates to -p10 version not -p7, > and i did not find a way to to get -p7. It seems to be unsupported > feature. I've tried to use the kernel.debug from 13.0-RELEASE but the output doesn't seem to make any sense. > What do you mean to use debug kernel with WITNESS and INVARIANTS? To > build custom kernel GENERIC + add those two options or is there a > common kernel build config used by devs that already includes those > options? IIRC main branch has those options enabled, but not stable or releng branches IIRC. If you build from stable or releng branch you will have to add: makeoptions DEBUG=-g options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC To your kernel config. Ie: this is removed from the GENERIC config during the preparation of a release: https://cgit.freebsd.org/src/commit/sys/amd64/conf/GENERIC?id=bfd15705156b0436cfe79aea11868dcc0c6e078a Regards, Roger.