Re: bhyve -D not cleaning up after itself
- In reply to: David E. Cross: "bhyve -D not cleaning up after itself"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 29 Nov 2021 21:19:59 UTC
On Sat, Nov 27, 2021 at 02:40:57AM -0500, David E. Cross wrote: > I have noticed for awhile that bhyve -D doesn't seem to actually do what > is claimed (to destroy a VM on guest initiated power-off). This > evening I decided to ktrace it to see if I was just not getting > something about how this was supposed to work, and found: > > > 68613 vcpu 0 CALL > __sysctlbyname(0x1ebcdb20a133,0xe,0,0,0x1ebce4ba60f0,0x9) > 68613 vcpu 0 SCTL "hw.vmm.destroy" > 68613 vcpu 0 RET __sysctlbyname -1 errno 1 Operation not permitted > 68613 vcpu 0 CALL exit(0x1) > > > Reading quickly the kernel code for vm_destroy(), I find 2 candidates: > > static int > vmm_priv_check(struct ucred *ucred) > { > > if (jailed(ucred) && > !(ucred->cr_prison->pr_allow & pr_allow_flag)) > return (EPERM); > > return (0); > } > > This doesn't seem to be it, my process is not jailed. > > That leads to the only other (I think) call in sysctl_vmm_destroy that > could return EPERM: > > error = sysctl_handle_string(oidp, buf, buflen, req); > > > But I am just not seeing it. Also this EXACT same call works from the > context of bhyvectl --vm=FOO --destroy, run from the same shell as the > bhyve process that just terminated. Is the 'ctx' somehow incorrect in > bhyve? I is used earlier in that function, so I am assuming it is right? The problem is that bhyve runs in capability mode (see capiscum(4)), which restricts access to the sysctl namespace. In particular, most sysctls are not accessible, including hw.vmm.destroy, so -D is effectively broken. One possible solution is to spawn an unsandboxed helper process which can toggle the sysctl on bhyve's behalf. That is a rather heavyweight solution, though. Earlier this year some work was done on using a file descriptor-based interface to create and destroy VMs, moving away from the old sysctl-based interface. It's stalled at the moment but I hope to return to that work quite soon. That should also help fix the problem but will take some time to complete. I think it may be easiest to simply allow writes to the sysctl for the time being: https://reviews.freebsd.org/D33169