Re: ZFS + FreeBSD XEN dom0 panic
- Reply: Roger Pau Monné : "Re: ZFS + FreeBSD XEN dom0 panic"
- In reply to: Roger Pau Monné : "Re: ZFS + FreeBSD XEN dom0 panic"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 02 Mar 2022 17:26:18 UTC
Today managed to crash lab Dom0 with: xen_cmdline="dom0_mem=6144M dom0_max_vcpus=2 dom0=pvh,verbose=1 console=vga,com1 com1=9600,8n1 guest_loglvl=all loglvl=all sync_console=1 reboot=no" I wrote ' vmstat -m | sort -k 2 -r' each 120 seconds, the latest one was as in attachment, panic was with the same fingerprint as the one with "rman_is_region_manager" line already reported. The scripts i ran in parallel generally were the same as attached in bug report, just a bit modified. 1) ./libexec.sh zfs_volstress_fast_4g (this just creates new ZVOLs and instead of 2GB, it writes 4BG in each ZVOL created dd if=/dev/zero) 2) ./test_vm1_zvol_3gb.sh (this loops commands: start first DomU, write 3GB in it's /tmp, restart DomU, removes /tmp, repeat) 3) ./test_vm2_zvol_5_on_off.sh (this loops: start second DomU, which has 5 disks attached, turn off DomU, repeat) 4) monitoring, sleep 120 seconds, print vmstat | sort in serial output. Around dom id 108, system started to behave suspiciously, xl list showed DomUs created, but they did not really start up, script timeout-ed for ssh connection, no vnc. When i did xl destroy manually, and xl create, system panic happened. I have log files for all serial output, if there is anything useful, i can provide. On disk log files seems to loose latest messages due to crash. On Wed, Mar 2, 2022 at 3:57 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > On Wed, Mar 02, 2022 at 10:57:37AM +0200, Ze Dupsys wrote: > > Hello, > > > > I started using XEN on one pre-production machine (with aim to use later > in > > production) with 12.2, but since it experienced random crashes i updated > to > > 13.0 in hope that errors might disappear. > > > > I do not know how detailed should i write, so that this email is not too > > long, but gives enough info. > > > > FreeBSD Dom0 is installed on ZFS, somewhat basic install, IPFW and rules > > for NATting are used. Zpool is composed of 2 mirrored disks. There is a > > ZVOL volmode=dev for each VM and VM's jail that are attached as raw > devices > > to DomU. At the moment DomUs contain FreeBSD, some 12.0 to 13.0, UFS, > with > > VNET jails, epairs all bridged to DomU's xn0 interface. On Dom0 i have > > bridge interfaces, where DomU's are connected depending on their > > "zone/network", those that have allowed outgoing connections are NATted > by > > IPFW on specific physical NIC and IP. > > So from the traces on the ticket: > > panic: pmap_growkernel: no memory to grow kernel > cpuid = 0 > time = 1646123072 > KDB: stack backtrace: > #0 0xffffffff80c57525 at kdb_backtrace+0x65 > #1 0xffffffff80c09f01 at vpanic+0x181 > #2 0xffffffff80c09d73 at panic+0x43 > #3 0xffffffff81073eed at pmap_growkernel+0x27d > #4 0xffffffff80f2dae8 at vm_map_insert+0x248 > #5 0xffffffff80f30249 at vm_map_find+0x549 > #6 0xffffffff80f2bf76 at kmem_init+0x226 > #7 0xffffffff80c73341 at vmem_xalloc+0xcb1 > #8 0xffffffff80c72c3b at vmem_xalloc+0x5ab > #9 0xffffffff80f2bfce at kmem_init+0x27e > #10 0xffffffff80c73341 at vmem_xalloc+0xcb1 > #11 0xffffffff80c72c3b at vmem_xalloc+0x5ab > #12 0xffffffff80c72646 at vmem_alloc+0x46 > #13 0xffffffff80f2b616 at kmem_malloc_domainset+0x96 > #14 0xffffffff80f21a2a at uma_prealloc+0x23a > #15 0xffffffff80f235de at sysctl_handle_uma_zone_cur+0xe2e > #16 0xffffffff80f1f6af at uma_set_align+0x8f > #17 0xffffffff82463362 at abd_borrow_buf_copy+0x22 > Uptime: 4m9s > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x22710028 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80c45892 > stack pointer = 0x28:0xfffffe0096600930 > frame pointer = 0x28:0xfffffe0096600930 > code segment = base rx0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 1496 (devmatch) > trap number = 12 > panic: page fault > cpuid = 0 > time = 1646123791 > KDB: stack backtrace: > #0 0xffffffff80c57525 at kdb_backtrace+0x65 > #1 0xffffffff80c09f01 at vpanic+0x181 > #2 0xffffffff80c09d73 at panic+0x43 > #3 0xffffffff8108b1a7 at trap+0xbc7 > #4 0xffffffff8108b1ff at trap+0xc1f > #5 0xffffffff8108a85d at trap+0x27d > #6 0xffffffff81061b18 at calltrap+0x8 > #7 0xffffffff80c62011 at rman_is_region_manager+0x241 > #8 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101 > #9 0xffffffff80c1949c at kernel_sysctl+0x43c > #10 0xffffffff80c19b13 at userland_sysctl+0x173 > #11 0xffffffff80c1995f at sys___sysctl+0x5f > #12 0xffffffff8108baac at amd64_syscall+0x10c > #13 0xffffffff8106243e at Xfast_syscall+0xfe > > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 02 > fault virtual address = 0x68 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff824a599d > stack pointer = 0x28:0xfffffe00b1e27910 > frame pointer = 0x28:0xfffffe00b1e279b0 > code segment = base rx0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (xbbd7 taskq) > trap number = 12 > panic: page fault > cpuid = 1 > time = 1646122723 > KDB: stack backtrace: > #0 0xffffffff80c57525 at kdb_backtrace+0x65 > #1 0xffffffff80c09f01 at vpanic+0x181 > #2 0xffffffff80c09d73 at panic+0x43 > #3 0xffffffff8108b1a7 at trap+0xbc7 > #4 0xffffffff8108b1ff at trap+0xc1f > #5 0xffffffff8108a85d at trap+0x27d > #6 0xffffffff81061b18 at calltrap+0x8 > #7 0xffffffff8248935a at dmu_read+0x2a > #8 0xffffffff82456a3a at zvol_geom_bio_strategy+0x2aa > #9 0xffffffff80a7f214 at xbd_instance_create+0xa394 > #10 0xffffffff80a7b1ea at xbd_instance_create+0x636a > #11 0xffffffff80c6b1c1 at taskqueue_run+0x2a1 > #12 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac > #13 0xffffffff80bc7e3e at fork_exit+0x7e > #14 0xffffffff81062b9e at fork_trampoline+0xe > Uptime: 1h44m10s > > This all look to me like Out of Memory conditions, can you check with > `top` whats going on with your memory? > > Might also be helpful to record periodic calls to `vmstat -m | sort -k > 2 -r` to try to figure out what's using so much memory. > > Regards, Roger. >