[Bug 265196] talos linux vms hang on reboot at the com ports, need to reboot the host to clear it up
Date: Thu, 21 Jul 2022 18:47:42 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265196 --- Comment #23 from John Baldwin <jhb@FreeBSD.org> --- So in the case that bhyvectl hangs, from the procstat -kk output, bhyvectl is waiting because some other process has the /dev/vmm/<vmname> file still open. For that case, you can try using 'sudo procstat -af | grep <vmname>' to see which processes still have it open preventing bhyvectl from exiting. For the case where you had a bhyve exit of 4 and an error of 'vm_open: No such file or directory', that may be a race between the async destroy used on 13.1 for bhyve (but since fixed in 14 and stable/13 so that bhyvectl will now sleep waiting for the --destroy request to end before returning). They return value of 134 is due to abort() and is the triple fault case you have logs of in bhyve.log. A triple fault isn't a crash of bhyve, that is a bit of an old-school way to reboot an x86 computer. It's perhaps a bit odd that a Linux guest would use that to reboot vs more conventional means. However, you shouldn't have to reboot the host machine just because the guest exits due to a triple fault. You should be able to restart the VM again without rebooting the host. Here I use "host" to mean the FreeBSD machine running bhyve VMs. Looking again, it seems like the talos upgrade is perhaps trying to use kexec to upgrade instead of a real reboot, and that the second Linux kernel is perhaps crashing (and not trying to use a triple fault to reboot). Given the turn around times for VM booting, you don't really need kexec for VMs. If you want to debug this you will have to debug the crash that happens in the second Linux kernel. It may be that there is something bhyve isn't emulating quite right that results in the triple fault, but it will be hard to know what that is from the bhyve side. I would see if there's a way to configure talos to not use kexec and just use "plain" reboots for upgrades instead. -- You are receiving this mail because: You are the assignee for the bug.