Windows 11 22H2 with passed-through PCI devices hangs in vm_handle_rendezvous() at boot

Reply: Robert Crowston : "Re: Windows 11 22H2 with passed-through PCI devices hangs in vm_handle_rendezvous() at boot"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: Robert Crowston <crowston_at_protonmail.com>
Date: Tue, 03 Jan 2023 23:54:16 UTC

Still investigating this. AMD 1700, FreeBSD 13.1 stable@3dd6497894. VM is Windows 11 22H2.

It happens on the setup disk -- at the TianoCore logo, before the "ring" has finished its first rotation -- so very early in the boot process. It's eventually happened for every Win 11 install I have made. Removing the passthrough devices and installing Windows, then re-adding the devices, a fresh install will boot with the passthrough devices a few times, but then shows the same hang behaviour forever after. Windows Boot Repair also hangs. On the host, bhyvectl --destroy hangs. gdb cannot stop bhyve and just hangs as well. None of these hangs show any CPU use. kldunload vmm crashes the host with a page fault. Only a reboot of the host will kill the guest.

Setting the guest cpu count to 1, or removing all the passthrough devices allows Windows 11 to boot. The same behaviour happens for two different USB controllers I have and two different GPUs. The same bhyve configurations reliably boot Windows Server 2022 and Windows 10 with passthrough working.

Debugging in userspace, I can see that Windows 11 does PCI enumeration in parallel across multiple cores, and sometimes during boot one vCPU writes a PCI config register at approximately the same time as another vCPU reads that exact register. The hang seems to be aligned with this synchronized write/read. Also, I can sometimes boot successfully under gdb when single stepping PCI cfg register writes, but it's difficult to be sure because my debugging is probably disturbing the timing. I looked at the bhyve code and I don't see what here could be racing in user space. In any event, it's a kernel-side bug.

Spinning up the kernel debugger, what I always see is:
1. 1 bhyve thread in vioapic_mmio_write() -> ... -> vm_handle_rendezvous() -> _sleep()
2. 1 bhyve thread in vcpu_lock_one() -> ... -> vcpu_set_state_locked() -> msleep_spin_sbt()
3. All remaining bhyve threads, if any, in vm_run() -> vm_handle_rendezvous() -> _sleep().

Example backtrace attached.

So it looks like we have some kind of a deadlock between vcpu_lock_one() and vioapci_mmio_write()? Anyone seen anything like it?

— RHC.