Re: ZFS + FreeBSD XEN dom0 panic
- In reply to: Roger Pau Monné : "Re: ZFS + FreeBSD XEN dom0 panic"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 10 Mar 2022 17:46:21 UTC
On 2022.03.10. 17:40, Roger Pau Monné wrote: > That's not expected. Can you paste the output of `xenstore-ls -fp` > when you get those stale entries? Yes, i can (xenstore_ls_fp.log), today i was stressing 12.1, 8GB RAM for 10 hours. I will power off all DomUs attach output of commands to this email. > Also, what does `xl list`? I attached xl_list.log, not to loose formatting due to email software. Added "sysctl -a -N | grep xbbd" as well, since there can be seen that sysctl variables exist. > Also, can you check the log files at '/var/log/xen/xl-*.log' (where * is > the domain name) to try to gather why the backend is not properly > destroyed? Those files seem not to be helpful, I have not seen different messages than those, i.e xl-xen-vm2-zvol-5.log.7: Waiting for domain xen-vm2-zvol-5 (domid 486) to die [pid 11628] Domain 486 has shut down, reason code 0 0x0 Action for shutdown reason code 0 is destroy Domain 486 needs to be cleaned up: destroying the domain Done. Exiting now BTW is it possible to disable automatic log file rotation in /var/log/xen? Or make it always append to single file so that no info is lost? Maybe there has been some info for domain ID 200 for example, it's just that i have never seen different messages there. Same goes for qemu-dm-xen-* files, nothing eye catchy: qemu-system-i386: -serial pty: char device redirected to /dev/pts/6 (label serial0) VNC server running on 0.0.0.0:5907 qemu-system-i386: terminating on signal 1 from pid 24773 (xl) > Right, at some point you will run out of memory if resources are not > properly freed. Can you try to boot with "boot_verbose=YES" in > /boot/loader.conf and see if that gives you are more information? Yes, i will add and see if some more info is available. > Otherwise I might have to provide you with a patch to blkback in order > to attempt to detect why backends are not destroyed. Sure. I'll just have to learn how to compile Xen from source then. I think i saw somewhere a howto. Which version of FreeBSD should i stress? 13.0? Or then it all must be built from current source? > Since you stress the system quite badly, do you by any chance see 'xl' > processes getting terminated? Background xl processes being killed > will lead to backends not being properly shut down. I wouldn't stress it so much if it behaved in the first place when the load was normal :) I've never seen any process containing xl in it's name being killed, qemu-system-i386 have been OOM killed though, but on those instances usually VM just does not start or cleans up and is gone. And i've seen system panic without OOM kill messages. For the crash cases on normal load, i just noticed that when that happened it was always when i had some sort of HDD load, but i never remembered how many VMs i had started or restarted. Thus i invented these stress test scripts in hope to find the reason why those crashes ever happened. I even would not mind if just DomU's would crash, as long as their HDDs are not corrupt and Dom0 stays stable and can auto reboot them. As for tests, i've been experimenting in various ways to somehow understand if it is due to VM restart count, attached HDD count or data throughput on HDDs. But for now i'm going in circles, because sometimes panic happens early, but sometimes it happens between domID 2 to 3 hundred or later and those non-deterministic cases are hard to locate. Thanks!