Re: ZFS + FreeBSD XEN dom0 panic

From: Ze Dupsys <zedupsys_at_gmail.com>
Date: Sat, 26 Mar 2022 22:38:00 UTC
On 2022.03.26. 16:38, Roger Pau Monné wrote:
 > ..
 > It's weird, because here you get a page fault, but there are also
 > traces with:
 > ..
 > general protection fault while in kernel mode
 > ..
 > That show a general protection fault instead of a page fault.

Yes indeed, i had not noticed this. Grepped across 34 stored panic log 
files, i see that 28 are page fault, 4 are general protection fault, 2 
other. I though maybe RAM size influences this, but page faults have 2G, 
4G, 6G, 8G Dom0, general protection faults have 2G, 4G, 8G.

I have no idea what triggers what, since stress tests and command line 
args are more or less the same. Builds are different with patches, some 
debug info, etc. Almost all panic traces have "rman_is_region_manager" 
in mid, actually looking all of them together seemed interesting. I'll 
attach unique panic traces, since some included snprintf, kvprintf as 
well, maybe helpful. Unfortunately i do not know which version and what 
patches were applied.


 > I've also noticed it seems to always be 'devmatch' the process that
 > triggers the panic.

Yes, it seems to be the case most of the time. There are 3 cases when
process is "xbbd* taskq". 2 cases with 2G RAM, 1 with 6G.


 > I've been able to get a better trace with gdb and your debug symbols,
 > and this is:
 >
 > (gdb) info line *0xffffffff80c6a2b2
 > Line 1386 of "/usr/src/sys/kern/subr_bus.c" starts at address 
0xffffffff80c6a2b2 <device_get_name+18>
 >     and ends at 0xffffffff80c6a2b6 <device_get_name+22>.
 > (gdb) info line *0xffffffff80c86ed1
 > Line 1052 of "/usr/src/sys/kern/subr_rman.c" starts at address 
0xffffffff80c86ecc <sysctl_rman+540>
 >     and ends at 0xffffffff80c86ed5 <sysctl_rman+549>.

This is a nice find!


 > I'm trying to figure out how the device could be removed or
 > disconnected from the rman. I will try to create a patch to catch the
 > device that leaves rman regions when destroyed/removed.

Okay, i'll apply when it will be possible.

I did run xen-debug on system with applied blkback.patch as you sent in 
next message to this.

System had panic with new trace:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address	= 0xa4
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80c90ed0
stack pointer	        = 0x28:0xfffffe0051927ab0
frame pointer	        = 0x28:0xfffffe0051927ad0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 16 (xenwatch)
trap number		= 12
panic: page fault
cpuid = 1
time = 1648331592
KDB: stack backtrace:
#0 0xffffffff80c7c275 at kdb_backtrace+0x65
#1 0xffffffff80c2e2d1 at vpanic+0x181
#2 0xffffffff80c2e143 at panic+0x43
#3 0xffffffff810c8b97 at trap+0xba7
#4 0xffffffff810c8bef at trap+0xbff
#5 0xffffffff810c8243 at trap+0x253
#6 0xffffffff810a0838 at calltrap+0x8
#7 0xffffffff80a98515 at xbd_instance_create+0x7895
#8 0xffffffff80a98462 at xbd_instance_create+0x77e2
#9 0xffffffff80a9619b at xbd_instance_create+0x551b
#10 0xffffffff80f95c54 at xenbusb_localend_changed+0x7c4
#11 0xffffffff80ab0ef4 at xs_unlock+0x704
#12 0xffffffff80beaede at fork_exit+0x7e
#13 0xffffffff810a18ae at fork_trampoline+0xe

cat /tmp/panic.log| sed -Ee 's/^#[0-9]* //' -e 's/ .*//' | xargs 
addr2line -e /usr/lib/debug/boot/kernel/kernel.debug

/usr/src/sys/kern/subr_kdb.c:443
/usr/src/sys/kern/kern_shutdown.c:0
/usr/src/sys/kern/kern_shutdown.c:844
/usr/src/sys/amd64/amd64/trap.c:944
/usr/src/sys/amd64/amd64/trap.c:0
/usr/src/sys/amd64/amd64/trap.c:0
/usr/src/sys/amd64/amd64/exception.S:292
/usr/src/sys/dev/xen/blkback/blkback.c:2789
/usr/src/sys/dev/xen/blkback/blkback.c:3431
/usr/src/sys/dev/xen/blkback/blkback.c:3912
/usr/src/sys/xen/xenbus/xenbusb_back.c:238
/usr/src/sys/dev/xen/xenstore/xenstore.c:1007
/usr/src/sys/kern/kern_fork.c:1099
/usr/src/sys/amd64/amd64/exception.S:1091

Full serial log in attachment.

Thanks.