Re: Propose a new stage `vnet_shutdown` before `vnet_destroy`
- In reply to: James Gritton : "Re: Propose a new stage `vnet_shutdown` before `vnet_destroy`"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 06 Jan 2023 10:14:06 UTC
> On Dec 19, 2022, at 1:44 AM, James Gritton <jamie@freebsd.org> wrote: > > On 2022-12-18 00:01, Zhenlei Huang wrote: >> I'm currently working on route nexthop caching feature for tunneling >> interfaces such as >> if_gif, if_gre, if_vxlan, and potentially if_wg. I encounter a nasty >> bug related to VNET lifecycle. >> More preciously I'd like to call `rib_unsubscribe()` to unsubscribe >> route event when the interface >> tunnel is deleted (gif_delete_tunnel). >> While on VNET shutting down, VNET SYSUNINIT was called and the routing >> vnet subsystem >> is destroyed before the interface going down and hence cause >> pagefault. I do not want to check >> `vnet.vnet_shutdown` state as it looks messed up. >> I'm recently reviewing the life cycles of prison and get some inspirations. >> When the jail / prison is submitted to destroy ( by jail_remove >> syscall ) then SIGKILL is sent to >> the prison's processes. I think it is correct order to destroy jail / >> prison. To summarize, the life cycle >> of jail / prison is: >> on jail create: PRISON_STATE_INVALID -> create VNET -> >> PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses, >> routing, etc. -> create / attach (network) processes >> on jail destroy: jexec kill processes (1) by user -> mark it as >> PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2) -> >> destroy VNET (if prison pr_ref go to the last one) -> DYED >> The (2) is a cleanup by kernel as (1) is possible not done by user. >> So it comes the idea about the life cycle of VNET. >> While on jail destroy, the network resources are cleaned up by >> vnet_destroy ( SYSUNINIT ). Then the >> order of SYSUNINIT of network components is hacking as circular >> network resource dependency is possible. >> For example the routing table entries (nhop) have reference of ifnet, >> and ifnet have reference to route nhop (cache), as >> I encountered. >> Just like the cleanup processes by kernel, we can introduce a new >> stage `vnet_shutdown` that clean up network resources. >> When jail / prison is going to dye, after kernel has cleaned up >> processes it call `vnet_shutdown` to cleanup network resources, >> then vnet_destroy will go smoothly as there's no circular network >> resource dependency right now. >> The life cycle of prison becomes: >> on jail create: PRISON_STATE_INVALID -> create VNET -> >> PRISON_STATE_ALIVE -> setup network resources, ifnet, if addresses, >> routing, etc. -> create / attach (network) processes >> on jail destroy: jexec kill processes (1) by user -> mark it as >> PRISON_STATE_DYING -> send SIGKILL to processes by kernel (2) -> >> vnet_shutdown cleanup network resources -> destroy VNET (if prison >> pr_ref go to the last one) -> DYED >> This idea is still unmature and I hope to hear more voices about it. > > This is absolutely the direction things need to go. Vnet isn't the > only thing that can have these problems, though it's been the biggest > offender. There could also be cycles that involve more than one > subsystem, which could be helped by broad application of this idea. > > There's a function in kern_jail.c ready for this: prison_cleanup. > It's called in "mark PRISON_STATE_DYING" stage of things. That's > before the "send SIGKILL" part of your sequence, but otherwise fits. > Submitted to Phabricator for review: https://reviews.freebsd.org/D37956 https://reviews.freebsd.org/D37957 > - Jamie Best regards, Zhenlei