Re: ZFS + FreeBSD XEN dom0 panic

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Wed, 02 Mar 2022 13:57:12 UTC
On Wed, Mar 02, 2022 at 10:57:37AM +0200, Ze Dupsys wrote:
> Hello,
> 
> I started using XEN on one pre-production machine (with aim to use later in
> production) with 12.2, but since it experienced random crashes i updated to
> 13.0 in hope that errors might disappear.
> 
> I do not know how detailed should i write, so that this email is not too
> long, but gives enough info.
> 
> FreeBSD Dom0 is installed on ZFS, somewhat basic install, IPFW and rules
> for NATting are used. Zpool is composed of 2 mirrored disks. There is a
> ZVOL volmode=dev for each VM and VM's jail that are attached as raw devices
> to DomU. At the moment DomUs contain FreeBSD, some 12.0 to 13.0, UFS, with
> VNET jails, epairs all bridged to DomU's xn0 interface. On Dom0 i have
> bridge interfaces, where DomU's are connected depending on their
> "zone/network", those that have allowed outgoing connections are NATted by
> IPFW on specific physical NIC and IP.

So from the traces on the ticket:

panic: pmap_growkernel: no memory to grow kernel
cpuid = 0
time = 1646123072
KDB: stack backtrace:
#0 0xffffffff80c57525 at kdb_backtrace+0x65
#1 0xffffffff80c09f01 at vpanic+0x181
#2 0xffffffff80c09d73 at panic+0x43
#3 0xffffffff81073eed at pmap_growkernel+0x27d
#4 0xffffffff80f2dae8 at vm_map_insert+0x248
#5 0xffffffff80f30249 at vm_map_find+0x549
#6 0xffffffff80f2bf76 at kmem_init+0x226
#7 0xffffffff80c73341 at vmem_xalloc+0xcb1
#8 0xffffffff80c72c3b at vmem_xalloc+0x5ab
#9 0xffffffff80f2bfce at kmem_init+0x27e
#10 0xffffffff80c73341 at vmem_xalloc+0xcb1
#11 0xffffffff80c72c3b at vmem_xalloc+0x5ab
#12 0xffffffff80c72646 at vmem_alloc+0x46
#13 0xffffffff80f2b616 at kmem_malloc_domainset+0x96
#14 0xffffffff80f21a2a at uma_prealloc+0x23a
#15 0xffffffff80f235de at sysctl_handle_uma_zone_cur+0xe2e
#16 0xffffffff80f1f6af at uma_set_align+0x8f
#17 0xffffffff82463362 at abd_borrow_buf_copy+0x22
Uptime: 4m9s


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x22710028
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c45892
stack pointer           = 0x28:0xfffffe0096600930
frame pointer           = 0x28:0xfffffe0096600930
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1496 (devmatch)
trap number             = 12
panic: page fault
cpuid = 0
time = 1646123791
KDB: stack backtrace:
#0 0xffffffff80c57525 at kdb_backtrace+0x65
#1 0xffffffff80c09f01 at vpanic+0x181
#2 0xffffffff80c09d73 at panic+0x43
#3 0xffffffff8108b1a7 at trap+0xbc7
#4 0xffffffff8108b1ff at trap+0xc1f
#5 0xffffffff8108a85d at trap+0x27d
#6 0xffffffff81061b18 at calltrap+0x8
#7 0xffffffff80c62011 at rman_is_region_manager+0x241
#8 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101
#9 0xffffffff80c1949c at kernel_sysctl+0x43c
#10 0xffffffff80c19b13 at userland_sysctl+0x173
#11 0xffffffff80c1995f at sys___sysctl+0x5f
#12 0xffffffff8108baac at amd64_syscall+0x10c
#13 0xffffffff8106243e at Xfast_syscall+0xfe


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x68
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff824a599d
stack pointer           = 0x28:0xfffffe00b1e27910
frame pointer           = 0x28:0xfffffe00b1e279b0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (xbbd7 taskq)
trap number             = 12
panic: page fault
cpuid = 1
time = 1646122723
KDB: stack backtrace:
#0 0xffffffff80c57525 at kdb_backtrace+0x65
#1 0xffffffff80c09f01 at vpanic+0x181
#2 0xffffffff80c09d73 at panic+0x43
#3 0xffffffff8108b1a7 at trap+0xbc7
#4 0xffffffff8108b1ff at trap+0xc1f
#5 0xffffffff8108a85d at trap+0x27d
#6 0xffffffff81061b18 at calltrap+0x8
#7 0xffffffff8248935a at dmu_read+0x2a
#8 0xffffffff82456a3a at zvol_geom_bio_strategy+0x2aa
#9 0xffffffff80a7f214 at xbd_instance_create+0xa394
#10 0xffffffff80a7b1ea at xbd_instance_create+0x636a
#11 0xffffffff80c6b1c1 at taskqueue_run+0x2a1
#12 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac
#13 0xffffffff80bc7e3e at fork_exit+0x7e
#14 0xffffffff81062b9e at fork_trampoline+0xe
Uptime: 1h44m10s

This all look to me like Out of Memory conditions, can you check with
`top` whats going on with your memory?

Might also be helpful to record periodic calls to `vmstat -m | sort -k
2 -r` to try to figure out what's using so much memory.

Regards, Roger.