Re: ZFS + FreeBSD XEN dom0 panic

From: Ze Dupsys <zedupsys_at_gmail.com>
Date: Mon, 14 Mar 2022 08:06:58 UTC
I'd like to share more analysis on given problem. I do not know if this 
somehow helps or not, but i have noticed that across all my saved serial 
log outputs, panic messages follow after some of these lines.

..
(XEN) HVM d34v0 save: TSC_ADJUST
(XEN) HVM d34v0 save: CPU_MSR
(XEN) HVM34 restore: CPU 0
xnb(xnb_detach:1330):
xnb(xnb_detach:1339):
.. => panic


Most of panics are like this
..
(XEN) HVM d26v0 save: TSC_ADJUST
(XEN) HVM d26v0 save: CPU_MSR
(XEN) HVM26 restore: CPU 0
.. => panic

..
(XEN) HVM d42v0 save: TSC_ADJUST
(XEN) HVM d42v0 save: CPU_MSR
(XEN) HVM42 restore: CPU 0
xnb(xnb_detach:1330):
xnb(xnb_detach:1339):
xnb(xnb_detach:1330):
xnb(xnb_detach:1339):
.. => panic


This one i think had different stressing conditions than other's, but i 
don't remember
..
(XEN) HVM d660v0 save: CPU_MSR
(XEN) HVM660 restore: CPU 0
(XEN) d659v0: upcall vector 93
spin lock 0xffffffff81eaa780 (sched lock 1) held by 0xfffff8020152d000 
(tid 100434) too long
timeout stopping cpus
panic: spin lock held too long
.. => panic


For serial output in middle when there are no crashes i have noticed 
that there are at least 2 different execution paths.

For most VM's boot flow continues with serial lines like these:
..
(XEN) HVM1 restore: CPU 0
xnb(xnb_probe:1123): Claiming device 0, xnb
xnb(xnb_attach:1267): Attaching to backend/vif/1/0
xnb(xnb_frontend_changed:1391): frontend_state=Initialising, 
xnb_state=InitWait
(d1) HVM Loader
..

For some though, there are lines like these, but they still boot, it 
just seemed that these lines might be a possible continuation for 
"unsuccessful panic".
..
(XEN) HVM3 restore: CPU 0
xnb(xnb_detach:1330):
xnb(xnb_detach:1339):
xnb(xnb_detach:1330):
xnb(xnb_detach:1339):
xnb(xnb_probe:1123): Claiming device 0, xnb
xnb(xnb_attach:1267): Attaching to backend/vif/3/0
xnb(xnb_frontend_changed:1391): frontend_state=Initialising, 
xnb_state=InitWait
(d3) HVM Loader
..

Why those lines starting "xnb(xnb_detach:1330):" do not have any 
message? Could it be that there is a bad pointer to message buffer that 
can not be printed? And then sometimes panic happens because access goes 
out of allowed memory region?

Line numbers are just somewhat informational, since those messages are 
from across all my tests, various configs, versions. Yesterday i set up 
system with FreeBSD 13.1-STABLE, still can crash, same panic. What i do 
not know about those xnb messages is, to which VM they are related, 
since serial output is shared and on parallel while VM1 might be 
created, VM2 is starting or being destroyed.

Thanks.