Re: ZFS + FreeBSD XEN dom0 panic

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Sun, 27 Mar 2022 09:13:42 UTC
On Sun, Mar 27, 2022 at 12:38:00AM +0200, Ze Dupsys wrote:
> On 2022.03.26. 16:38, Roger Pau Monné wrote:
> > ..
> > It's weird, because here you get a page fault, but there are also
> > traces with:
> > ..
> > general protection fault while in kernel mode
> > ..
> > That show a general protection fault instead of a page fault.
> 
> Yes indeed, i had not noticed this. Grepped across 34 stored panic log
> files, i see that 28 are page fault, 4 are general protection fault, 2
> other. I though maybe RAM size influences this, but page faults have 2G, 4G,
> 6G, 8G Dom0, general protection faults have 2G, 4G, 8G.
> 
> I have no idea what triggers what, since stress tests and command line args
> are more or less the same. Builds are different with patches, some debug
> info, etc. Almost all panic traces have "rman_is_region_manager" in mid,
> actually looking all of them together seemed interesting. I'll attach unique
> panic traces, since some included snprintf, kvprintf as well, maybe helpful.
> Unfortunately i do not know which version and what patches were applied.
> 
> 
> > I've also noticed it seems to always be 'devmatch' the process that
> > triggers the panic.
> 
> Yes, it seems to be the case most of the time. There are 3 cases when
> process is "xbbd* taskq". 2 cases with 2G RAM, 1 with 6G.
> 
> 
> > I've been able to get a better trace with gdb and your debug symbols,
> > and this is:
> >
> > (gdb) info line *0xffffffff80c6a2b2
> > Line 1386 of "/usr/src/sys/kern/subr_bus.c" starts at address
> 0xffffffff80c6a2b2 <device_get_name+18>
> >     and ends at 0xffffffff80c6a2b6 <device_get_name+22>.
> > (gdb) info line *0xffffffff80c86ed1
> > Line 1052 of "/usr/src/sys/kern/subr_rman.c" starts at address
> 0xffffffff80c86ecc <sysctl_rman+540>
> >     and ends at 0xffffffff80c86ed5 <sysctl_rman+549>.
> 
> This is a nice find!
> 
> 
> > I'm trying to figure out how the device could be removed or
> > disconnected from the rman. I will try to create a patch to catch the
> > device that leaves rman regions when destroyed/removed.
> 
> Okay, i'll apply when it will be possible.
> 
> I did run xen-debug on system with applied blkback.patch as you sent in next
> message to this.
> 
> System had panic with new trace:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 04
> fault virtual address	= 0xa4
> fault code		= supervisor read data, page not present
> instruction pointer	= 0x20:0xffffffff80c90ed0
> stack pointer	        = 0x28:0xfffffe0051927ab0
> frame pointer	        = 0x28:0xfffffe0051927ad0
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags	= interrupt enabled, resume, IOPL = 0
> current process		= 16 (xenwatch)
> trap number		= 12
> panic: page fault
> cpuid = 1
> time = 1648331592
> KDB: stack backtrace:
> #0 0xffffffff80c7c275 at kdb_backtrace+0x65
> #1 0xffffffff80c2e2d1 at vpanic+0x181
> #2 0xffffffff80c2e143 at panic+0x43
> #3 0xffffffff810c8b97 at trap+0xba7
> #4 0xffffffff810c8bef at trap+0xbff
> #5 0xffffffff810c8243 at trap+0x253
> #6 0xffffffff810a0838 at calltrap+0x8
> #7 0xffffffff80a98515 at xbd_instance_create+0x7895
> #8 0xffffffff80a98462 at xbd_instance_create+0x77e2
> #9 0xffffffff80a9619b at xbd_instance_create+0x551b
> #10 0xffffffff80f95c54 at xenbusb_localend_changed+0x7c4
> #11 0xffffffff80ab0ef4 at xs_unlock+0x704
> #12 0xffffffff80beaede at fork_exit+0x7e
> #13 0xffffffff810a18ae at fork_trampoline+0xe
> 
> cat /tmp/panic.log| sed -Ee 's/^#[0-9]* //' -e 's/ .*//' | xargs addr2line
> -e /usr/lib/debug/boot/kernel/kernel.debug
> 
> /usr/src/sys/kern/subr_kdb.c:443
> /usr/src/sys/kern/kern_shutdown.c:0
> /usr/src/sys/kern/kern_shutdown.c:844
> /usr/src/sys/amd64/amd64/trap.c:944
> /usr/src/sys/amd64/amd64/trap.c:0
> /usr/src/sys/amd64/amd64/trap.c:0
> /usr/src/sys/amd64/amd64/exception.S:292
> /usr/src/sys/dev/xen/blkback/blkback.c:2789
> /usr/src/sys/dev/xen/blkback/blkback.c:3431
> /usr/src/sys/dev/xen/blkback/blkback.c:3912
> /usr/src/sys/xen/xenbus/xenbusb_back.c:238
> /usr/src/sys/dev/xen/xenstore/xenstore.c:1007
> /usr/src/sys/kern/kern_fork.c:1099
> /usr/src/sys/amd64/amd64/exception.S:1091

Thanks, unfortunately that patch was incomplete. I have an updated
version that I think is better now, and I've slightly tested it
(creating and destroying a domain with it doesn't seem to crash).
Appended patch at the end of the message.

> 
> Full serial log in attachment.
> 
> Thanks.

> ==== COUNT: 1
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer	= 0x20:0xffffffff80c45892
> stack pointer	        = 0x28:0xfffffe00d2d2b930
> frame pointer	        = 0x28:0xfffffe00d2d2b930
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags	= interrupt enabled, resume, IOPL = 0
> current process		= 10984 (devmatch)
> trap number		= 9
> panic: general protection fault
> cpuid = 0
> time = 1646305680
> KDB: stack backtrace:
> #0 0xffffffff80c57525 at kdb_backtrace+0x65
> #1 0xffffffff80c09f01 at vpanic+0x181
> #2 0xffffffff80c09d73 at panic+0x43
> #3 0xffffffff8108b1a7 at trap+0xbc7
> #4 0xffffffff8108a66e at trap+0x8e
> #5 0xffffffff81061b18 at calltrap+0x8
> #6 0xffffffff80c62011 at rman_is_region_manager+0x241
> #7 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101
> #8 0xffffffff80c1949c at kernel_sysctl+0x43c
> #9 0xffffffff80c19b13 at userland_sysctl+0x173
> #10 0xffffffff80c1995f at sys___sysctl+0x5f
> #11 0xffffffff8108baac at amd64_syscall+0x10c
> #12 0xffffffff8106243e at Xfast_syscall+0xfe
> Uptime: 1h15m46s
> 
> 
> ==== COUNT: 3
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer	= 0x20:0xffffffff80d0728f
> stack pointer	        = 0x28:0xfffffe00a17ea790
> frame pointer	        = 0x28:0xfffffe00a17ea790
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags	= interrupt enabled, resume, IOPL = 0
> current process		= 2785 (devmatch)
> trap number		= 9
> panic: general protection fault
> cpuid = 1
> time = 1646419029
> KDB: stack backtrace:
> #0 0xffffffff80c57525 at kdb_backtrace+0x65
> #1 0xffffffff80c09f01 at vpanic+0x181
> #2 0xffffffff80c09d73 at panic+0x43
> #3 0xffffffff8108b1a7 at trap+0xbc7
> #4 0xffffffff8108a66e at trap+0x8e
> #5 0xffffffff81061b18 at calltrap+0x8
> #6 0xffffffff80c5da17 at kvprintf+0x1007
> #7 0xffffffff80c5e719 at snprintf+0x59
> #8 0xffffffff80c6204b at rman_is_region_manager+0x27b
> #9 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101
> #10 0xffffffff80c1949c at kernel_sysctl+0x43c
> #11 0xffffffff80c19b13 at userland_sysctl+0x173
> #12 0xffffffff80c1995f at sys___sysctl+0x5f
> #13 0xffffffff8108baac at amd64_syscall+0x10c
> #14 0xffffffff8106243e at Xfast_syscall+0xfe

> Unique on "current process" and trace fingerprint #0-#*.
> 
> ==== COUNT: 23
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x22710028
> fault code    = supervisor read data, page not present
> instruction pointer = 0x20:0xffffffff80c45892
> stack pointer         = 0x28:0xfffffe0096600930
> frame pointer         = 0x28:0xfffffe0096600930
> code segment    = base 0x0, limit 0xfffff, type 0x1b
>       = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 1496 (devmatch)
> trap number   = 12
> panic: page fault
> cpuid = 0
> time = 1646123791
> KDB: stack backtrace:
> #0 0xffffffff80c57525 at kdb_backtrace+0x65
> #1 0xffffffff80c09f01 at vpanic+0x181
> #2 0xffffffff80c09d73 at panic+0x43
> #3 0xffffffff8108b1a7 at trap+0xbc7
> #4 0xffffffff8108b1ff at trap+0xc1f
> #5 0xffffffff8108a85d at trap+0x27d
> #6 0xffffffff81061b18 at calltrap+0x8
> #7 0xffffffff80c62011 at rman_is_region_manager+0x241
> #8 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101
> #9 0xffffffff80c1949c at kernel_sysctl+0x43c
> #10 0xffffffff80c19b13 at userland_sysctl+0x173
> #11 0xffffffff80c1995f at sys___sysctl+0x5f
> #12 0xffffffff8108baac at amd64_syscall+0x10c
> #13 0xffffffff8106243e at Xfast_syscall+0xfe
> 
> 
> 
> ==== COUNT: 2
> current process   = 20284 (devmatch)
> trap number   = 12
> panic: page fault
> cpuid = 3
> time = 1647247618
> KDB: stack backtrace:
> #0 0xffffffff80c7c615 at kdb_backtrace+0x65
> #1 0xffffffff80c2e621 at vpanic+0x181
> #2 0xffffffff80c2e493 at panic+0x43
> #3 0xffffffff810c8b97 at trap+0xba7
> #4 0xffffffff810c8bef at trap+0xbff
> #5 0xffffffff810c8243 at trap+0x253
> #6 0xffffffff810a09d8 at calltrap+0x8
> #7 0xffffffff80c82c77 at kvprintf+0x1007
> #8 0xffffffff80c83a09 at snprintf+0x59
> #9 0xffffffff80c8729b at rman_is_region_manager+0x27b
> #10 0xffffffff80c3ee81 at sbuf_new_for_sysctl+0x101
> #11 0xffffffff80c3e2cc at kernel_sysctl+0x3ec
> #12 0xffffffff80c3e943 at userland_sysctl+0x173
> #13 0xffffffff80c3e78f at sys___sysctl+0x5f
> #14 0xffffffff810c949c at amd64_syscall+0x10c
> #15 0xffffffff810a12eb at Xfast_syscall+0xfb

Thanks, those all seem to be related to a device being removed without
cleaning it's rman regions properly. So far I've spotted an issue in
blkback in this regard, but I wouldn't discard other issues in either
blkback or netback. Let's see if the updated blkback patch makes a
difference now.

> ==== COUNT: 2
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 02
> fault virtual address = 0x68
> fault code    = supervisor read data, page not present
> instruction pointer = 0x20:0xffffffff824a599d
> stack pointer         = 0x28:0xfffffe00b1e27910
> frame pointer         = 0x28:0xfffffe00b1e279b0
> code segment    = base 0x0, limit 0xfffff, type 0x1b
>       = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 0 (xbbd7 taskq)
> trap number   = 12
> panic: page fault
> cpuid = 1
> time = 1646122723
> KDB: stack backtrace:
> #0 0xffffffff80c57525 at kdb_backtrace+0x65
> #1 0xffffffff80c09f01 at vpanic+0x181
> #2 0xffffffff80c09d73 at panic+0x43
> #3 0xffffffff8108b1a7 at trap+0xbc7
> #4 0xffffffff8108b1ff at trap+0xc1f
> #5 0xffffffff8108a85d at trap+0x27d
> #6 0xffffffff81061b18 at calltrap+0x8
> #7 0xffffffff8248935a at dmu_read+0x2a
> #8 0xffffffff82456a3a at zvol_geom_bio_strategy+0x2aa
> #9 0xffffffff80a7f214 at xbd_instance_create+0xa394
> #10 0xffffffff80a7b1ea at xbd_instance_create+0x636a
> #11 0xffffffff80c6b1c1 at taskqueue_run+0x2a1
> #12 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac
> #13 0xffffffff80bc7e3e at fork_exit+0x7e
> #14 0xffffffff81062b9e at fork_trampoline+0xe
> 
> 
> ==== COUNT: 1
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 02
> fault virtual address = 0x148
> fault code    = supervisor read data, page not present
> instruction pointer = 0x20:0xffffffff8248cef4
> stack pointer         = 0x28:0xfffffe009941d9a0
> frame pointer         = 0x28:0xfffffe009941d9a0
> code segment    = base 0x0, limit 0xfffff, type 0x1b
>       = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 0 (xbbd1 taskq)
> trap number   = 12
> panic: page fault
> cpuid = 1
> time = 1646129773
> KDB: stack backtrace:
> #0 0xffffffff80c57525 at kdb_backtrace+0x65
> #1 0xffffffff80c09f01 at vpanic+0x181
> #2 0xffffffff80c09d73 at panic+0x43
> #3 0xffffffff8108b1a7 at trap+0xbc7
> #4 0xffffffff8108b1ff at trap+0xc1f
> #5 0xffffffff8108a85d at trap+0x27d
> #6 0xffffffff81061b18 at calltrap+0x8
> #7 0xffffffff825cb76e at zil_open+0xe
> #8 0xffffffff82456d02 at zvol_ensure_zilog+0xb2
> #9 0xffffffff82456818 at zvol_geom_bio_strategy+0x88
> #10 0xffffffff80a7f214 at xbd_instance_create+0xa394
> #11 0xffffffff80a7b1ea at xbd_instance_create+0x636a
> #12 0xffffffff80c6b1c1 at taskqueue_run+0x2a1
> #13 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac
> #14 0xffffffff80bc7e3e at fork_exit+0x7e
> #15 0xffffffff81062b9e at fork_trampoline+0xe

Hm, those last ones are in ZFS code, can you try to get the line
numbers for those?

Maybe it's blkback providing bad data to the disk open functions.

Since you are doing so much testing, it might make sense for you to
use a debug FreeBSD kernel rather than a production one (one with
WITNESS and INVARIANTS enabled).

Thanks, Roger.

---8<---
diff --git a/sys/dev/xen/blkback/blkback.c b/sys/dev/xen/blkback/blkback.c
index 33414295bf5e..4007a93a54c7 100644
--- a/sys/dev/xen/blkback/blkback.c
+++ b/sys/dev/xen/blkback/blkback.c
@@ -2774,19 +2774,12 @@ xbb_free_communication_mem(struct xbb_softc *xbb)
 static int
 xbb_disconnect(struct xbb_softc *xbb)
 {
-	struct gnttab_unmap_grant_ref  ops[XBB_MAX_RING_PAGES];
-	struct gnttab_unmap_grant_ref *op;
-	u_int			       ring_idx;
-	int			       error;
-
 	DPRINTF("\n");
 
-	if ((xbb->flags & XBBF_RING_CONNECTED) == 0)
-		return (0);
-
 	mtx_unlock(&xbb->lock);
 	xen_intr_unbind(&xbb->xen_intr_handle);
-	taskqueue_drain(xbb->io_taskqueue, &xbb->io_task); 
+	if (xbb->io_taskqueue != NULL)
+		taskqueue_drain(xbb->io_taskqueue, &xbb->io_task);
 	mtx_lock(&xbb->lock);
 
 	/*
@@ -2796,19 +2789,28 @@ xbb_disconnect(struct xbb_softc *xbb)
 	if (xbb->active_request_count != 0)
 		return (EAGAIN);
 
-	for (ring_idx = 0, op = ops;
-	     ring_idx < xbb->ring_config.ring_pages;
-	     ring_idx++, op++) {
-		op->host_addr    = xbb->ring_config.gnt_addr
-			         + (ring_idx * PAGE_SIZE);
-		op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx];
-		op->handle	 = xbb->ring_config.handle[ring_idx];
-	}
+	if (xbb->flags & XBBF_RING_CONNECTED) {
+		struct gnttab_unmap_grant_ref  ops[XBB_MAX_RING_PAGES];
+		struct gnttab_unmap_grant_ref *op;
+		unsigned int ring_idx;
+		int error;
+
+		for (ring_idx = 0, op = ops;
+		     ring_idx < xbb->ring_config.ring_pages;
+		     ring_idx++, op++) {
+			op->host_addr    = xbb->ring_config.gnt_addr
+				         + (ring_idx * PAGE_SIZE);
+			op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx];
+			op->handle	 = xbb->ring_config.handle[ring_idx];
+		}
 
-	error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops,
-					  xbb->ring_config.ring_pages);
-	if (error != 0)
-		panic("Grant table op failed (%d)", error);
+		error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops,
+						  xbb->ring_config.ring_pages);
+		if (error != 0)
+			panic("Grant table op failed (%d)", error);
+
+		xbb->flags &= ~XBBF_RING_CONNECTED;
+	}
 
 	xbb_free_communication_mem(xbb);
 
@@ -2839,7 +2841,6 @@ xbb_disconnect(struct xbb_softc *xbb)
 		xbb->request_lists = NULL;
 	}
 
-	xbb->flags &= ~XBBF_RING_CONNECTED;
 	return (0);
 }