Re: ZFS + FreeBSD XEN dom0 panic
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 18 Mar 2022 15:24:17 UTC
On Tue, Mar 15, 2022 at 08:51:57AM +0200, Ze Dupsys wrote: > On 2022.03.14. 11:19, Roger Pau Monné wrote: > > On Mon, Mar 14, 2022 at 10:06:58AM +0200, Ze Dupsys wrote: > > > .. > > > > > > Why those lines starting "xnb(xnb_detach:1330):" do not have any message? > > > Could it be that there is a bad pointer to message buffer that can not be > > > printed? And then sometimes panic happens because access goes out of allowed > > > memory region? > > Some messages in netback are just "\n", likely leftovers from debug. > Okay, found the lines, it is as you say. So this will not be an easy one. > > > > Can you try to stress the system again but this time with guests not > > having any network interfaces? (so that netback doesn't get used in > > dom0). > I'll try to come up with something. At the moment all commands to VMs are > given through ssh. > > > > Then if you could rebuild the FreeBSD dom0 kernel with the above patch > > we might be able to get a bit more of info about blkback shutdown. > I rebuilt 13.1 STABLE, with commenting out #undef and adding #define, thus > line number will differ by single line. For this test i did not remove > network interfaces, and did add DPRINTF messages to xnb_detach function as > well, since i hoped to maybe catch something there, by printing pointers. I > somewhat did not like that xnb_detach does not check for NULL return from > device_get_softc, nor for device_t argument, so i though, maybe those > crashes are something related to that. But i guess this will not be so easy, > and maybe it is safe to assume that "device_t dev" is always valid in that > context. > > So i ran stress test, system did not crash as it happens often when more > debugging info is printed, characteristics change. But it did leak sysctl > xbbd variables. I'll attach all collected log files. sysctl and xl list > commands differ in timing a little bit. xl list _02 is when all VMs are > turned off. Sysctl only has keys without values, not to trigger xnb tests > while reading all values. So I've been staring at this for a while, and I'm not yet sure I figured out exactly what's going on, but can you give a try to the patch below? Thanks, Roger. ---8<--- diff --git a/sys/xen/xenbus/xenbusb.c b/sys/xen/xenbus/xenbusb.c index e026f8203ea1..a8b75f46b9cc 100644 --- a/sys/xen/xenbus/xenbusb.c +++ b/sys/xen/xenbus/xenbusb.c @@ -254,7 +254,7 @@ xenbusb_delete_child(device_t dev, device_t child) static void xenbusb_verify_device(device_t dev, device_t child) { - if (xs_exists(XST_NIL, xenbus_get_node(child), "") == 0) { + if (xs_exists(XST_NIL, xenbus_get_node(child), "state") == 0) { /* * Device tree has been removed from Xenbus. * Tear down the device.