Re: git: c0f35dbf19c3 - main - vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.

From: Kristof Provost <kp_at_FreeBSD.org>
Date: Mon, 21 Nov 2022 13:15:45 UTC
On 18 Nov 2022, at 19:26, John Baldwin wrote:
> The branch main has been updated by jhb:
>
> URL: 
> https://cgit.FreeBSD.org/src/commit/?id=c0f35dbf19c3c8825bd2b321d8efd582807d1940
>
> commit c0f35dbf19c3c8825bd2b321d8efd582807d1940
> Author:     John Baldwin <jhb@FreeBSD.org>
> AuthorDate: 2022-11-18 18:05:10 +0000
> Commit:     John Baldwin <jhb@FreeBSD.org>
> CommitDate: 2022-11-18 18:25:38 +0000
>
>     vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
>
>     Retire the boot_state member of struct vlapic and instead use a 
> cpuset
>     in the VM to track vCPUs waiting for STARTUP IPIs.  INIT IPIs add
>     vCPUs to this set, and STARTUP IPIs remove vCPUs from the set.
>     STARTUP IPIs are only reported to userland for vCPUs that were 
> removed
>     from the set.
>
>     In particular, this permits a subsequent change to allocate vCPUs 
> on
>     demand when the vCPU may not be allocated until after a STARTUP 
> IPI is
>     reported to userland.
>
>     Reviewed by:    corvink, markj
>     Differential Revision:  https://reviews.freebsd.org/D37173
> ---
>  sys/amd64/include/vmm.h        |  3 +++
>  sys/amd64/vmm/io/vlapic.c      | 46 
> ++++++++++--------------------------------
>  sys/amd64/vmm/io/vlapic_priv.h |  7 -------
>  sys/amd64/vmm/vmm.c            | 27 +++++++++++++++++++++++++
>  4 files changed, 41 insertions(+), 42 deletions(-)
>

I’m seeing a panic starting a bhyve guest, and I think this commit 
might be the responsible one:

	login: panic: acquiring blockable sleep lock with spinlock or critical 
section held (sleep mutex) vm rendezvous lock @ 
/usr/src/sys/amd64/vmm/vmm.c:2508
	cpuid = 14
	time = 1669035212
	KDB: stack backtrace:
	db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfffffe015f2dd530
	vpanic() at vpanic+0x151/frame 0xfffffe015f2dd580
	panic() at panic+0x43/frame 0xfffffe015f2dd5e0
	witness_checkorder() at witness_checkorder+0xd3e/frame 
0xfffffe015f2dd7a0
	__mtx_lock_flags() at __mtx_lock_flags+0x94/frame 0xfffffe015f2dd7f0
	vm_start_cpus() at vm_start_cpus+0x31/frame 0xfffffe015f2dd820
	vlapic_icrlo_write_handler() at vlapic_icrlo_write_handler+0x30c/frame 
0xfffffe015f2dd8a0
	vmx_run() at vmx_run+0x20ed/frame 0xfffffe015f2dd9c0
	vm_run() at vm_run+0x1d2/frame 0xfffffe015f2ddaa0
	vmmdev_ioctl() at vmmdev_ioctl+0x644/frame 0xfffffe015f2ddb40
	devfs_ioctl() at devfs_ioctl+0xcd/frame 0xfffffe015f2ddb90
	vn_ioctl() at vn_ioctl+0x131/frame 0xfffffe015f2ddca0
	devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe015f2ddcc0
	kern_ioctl() at kern_ioctl+0x202/frame 0xfffffe015f2ddd30
	sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe015f2dde00
	amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015f2ddf30
	fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfffffe015f2ddf30
	--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x3dd9cbd7f94a, rsp = 
0x3ddc1d44ae58, rbp = 0x3ddc1d44af10 ---

And kgdb’s backtrace:

	(kgdb) bt
	#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
	#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405
	#2  0xffffffff80bec678 in dumpsys (di=0x0) at 
/usr/src/sys/x86/include/dump.h:87
	#3  doadump (textdump=textdump@entry=0) at 
/usr/src/sys/kern/kern_shutdown.c:434
	#4  0xffffffff804b403a in db_dump (dummy=<optimized out>, 
dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at 
/usr/src/sys/ddb/db_command.c:593
	#5  0xffffffff804b3e40 in db_command (last_cmdp=<optimized out>, 
cmd_table=<optimized out>, dopager=true) at 
/usr/src/sys/ddb/db_command.c:506
	#6  0xffffffff804b3b0d in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:553
	#7  0xffffffff804b71a6 in db_trap (type=<optimized out>, 
code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270
	#8  0xffffffff80c3b89e in kdb_trap (type=type@entry=3, 
code=<unavailable>, code@entry=0, tf=tf@entry=0xfffffe015f2dd470) at 
/usr/src/sys/kern/subr_kdb.c:745
	#9  0xffffffff810ce577 in trap (frame=0xfffffe015f2dd470) at 
/usr/src/sys/amd64/amd64/trap.c:611
	#10 <signal handler called>
	#11 kdb_enter (why=<optimized out>, msg=<optimized out>) at 
/usr/src/sys/kern/subr_kdb.c:509
	#12 0xffffffff80bec822 in vpanic (fmt=<optimized out>, 
ap=ap@entry=0xfffffe015f2dd5c0) at /usr/src/sys/kern/kern_shutdown.c:967
	#13 0xffffffff80bec5c3 in panic (fmt=0xffffffff81e8ce70 <cnputs_mtx> 
"\314:)\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:903
	#14 0xffffffff80c5ea4e in witness_checkorder (lock=0xfffff804d585d138, 
flags=<optimized out>, file=<optimized out>, line=2508, 
interlock=<optimized out>)
	    at /usr/src/sys/kern/subr_witness.c:1202
	#15 0xffffffff80bc6d04 in __mtx_lock_flags (c=0xfffff804d585d150, 
opts=0, file=0xffffffff8322542d "/usr/src/sys/amd64/vmm/vmm.c", 
line=2508) at /usr/src/sys/kern/kern_mutex.c:278
	#16 0xffffffff83204101 in vm_start_cpus (vm=0xfffff804d585d000, 
tostart=tostart@entry=0xfffffe015f2dd858) at 
/usr/src/sys/amd64/vmm/vmm.c:2508
	#17 0xffffffff83211b5c in vlapic_icrlo_write_handler 
(vlapic=vlapic@entry=0xfffff8048dc14a80, 
retu=retu@entry=0xfffffe015f2dd950) at 
/usr/src/sys/amd64/vmm/io/vlapic.c:1172
	#18 0xffffffff8321977d in vmx_handle_apic_write 
(vcpu=0xfffff8048db3ac00, vlapic=0xfffff8048dc14a80, qual=768) at 
/usr/src/sys/amd64/vmm/intel/vmx.c:2184
	#19 vmx_exit_process (vmx=0xfffff804d585b000, vcpu=0xfffff8048db3ac00, 
vmexit=0xfffff804a7c1a688) at /usr/src/sys/amd64/vmm/intel/vmx.c:2767
	#20 vmx_run (vcpui=0xfffff8048db3ac00, rip=<optimized out>, 
pmap=0xfffffe003dce1530, evinfo=0xfffffe015f2dd9d8) at 
/usr/src/sys/amd64/vmm/intel/vmx.c:3174
	#21 0xffffffff83202572 in vm_run (vcpu=vcpu@entry=0xfffff804a7c1a600, 
vme_user=vme_user@entry=0xfffff80032601b08) at 
/usr/src/sys/amd64/vmm/vmm.c:1873
	#22 0xffffffff83205ab4 in vmmdev_ioctl (cdev=<optimized out>, 
cmd=3230692865, data=<optimized out>, fflag=<optimized out>, 
td=<optimized out>) at /usr/src/sys/amd64/vmm/vmm_dev.c:565
	#23 0xffffffff80a7be7d in devfs_ioctl (ap=0xfffffe015f2ddba8) at 
/usr/src/sys/fs/devfs/devfs_vnops.c:933
	#24 0xffffffff80cf6421 in vn_ioctl (fp=0xfffff8000ce2cb40, 
com=<optimized out>, data=0xfffff80032601b00, 
active_cred=0xfffff8000c4d0800, td=0x10) at 
/usr/src/sys/kern/vfs_vnops.c:1699
	#25 0xffffffff80a7c52e in devfs_ioctl_f (fp=0xffffffff81e8ce70 
<cnputs_mtx>, com=0, data=0xffffffff812a0000, cred=0x1, td=0x10) at 
/usr/src/sys/fs/devfs/devfs_vnops.c:864
	#26 0xffffffff80c64672 in fo_ioctl (fp=0xfffff8000ce2cb40, 
com=3230692865, data=0x1d0, active_cred=0x1, td=<optimized out>) at 
/usr/src/sys/sys/file.h:365
	#27 kern_ioctl (td=td@entry=0xfffffe0160936000, fd=<optimized out>, 
com=com@entry=3230692865, data=0x1d0 <error: Cannot access memory at 
address 0x1d0>, data@entry=0xfffff80032601b00 "")
	    at /usr/src/sys/kern/sys_generic.c:803
	#28 0xffffffff80c643ba in sys_ioctl (td=0xfffffe0160936000, 
uap=0xfffffe01609363f8) at /usr/src/sys/kern/sys_generic.c:711
	#29 0xffffffff810cf3be in syscallenter (td=<optimized out>) at 
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
	#30 amd64_syscall (td=0xfffffe0160936000, traced=0) at 
/usr/src/sys/amd64/amd64/trap.c:1200
	#31 <signal handler called>
	#32 0x00003dd9cbd7f94a in ?? ()
	Backtrace stopped: Cannot access memory at address 0x3ddc1d44ae58
	(kgdb) fr 16
	#16 0xffffffff83204101 in vm_start_cpus (vm=0xfffff804d585d000, 
tostart=tostart@entry=0xfffffe015f2dd858) at 
/usr/src/sys/amd64/vmm/vmm.c:2508
	2508		mtx_lock(&vm->rendezvous_mtx);

I believe WITNESS is upset that we’re going to potentially sleep doing 
mtx_lock(&vm->rendezvous_mtx); in vm_start_cpus() when we’ve done 
critical_enter() in vm_run().

Best regards,
Kristof