Re: Warm and Live Migration Implementation for bhyve
Date: Mon, 26 Jun 2023 07:16:08 UTC
Hi Elena, thanks for posting this proposal here. Some open questions from my side: 1. How is the data send to the target? Does the host send a complete dump and the target parses it? Or does the target request data one by one und the host sends it as response? 2. What happens if we add a new data section? 3. What happens if the bhyve version differs on host and target machine? -- Kind regards, Corvin On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: > Hello, > > This mail presents the migration feature we have implemented for > bhyve. Any feedback from the community is much appreciated. > > We have opened a stack of reviews on Phabricator > (https://reviews.freebsd.org/D34717) that is meant to split the code > in smaller parts so it can be more easily reviewed. A brief history > of > the implementation can be found at the bottom of this email. > > The migration mechanism we propose needs two main components in order > to move a virtual machine from one host to another: > 1. the guest's state (vCPUs, emulated and virtualized devices) > 2. the guest's memory > > For the first part, we rely on the suspend/resume feature. We call > the > same functions as the ones used by suspend/resume, but instead of > saving the data in files, we send it via the network. > > The most time consuming aspect of migration is transmitting guest > memory. The UPB team has implemented two options to accomplish this: > 1. Warm Migration: The guest execution is suspended on the source > host > while the memory is sent to the destination host. This method is less > complex but may cause extended downtime. > 2. Live Migration: The guest continues to execute on the source host > while the memory is transmitted to the destination host. This method > is more complex but offers reduced downtime. > > The proposed live migration procedure (pre-copy live migration) > migrates the memory in rounds: > 1. In the initial round, we migrate all the guest memory (all pages > that are allocated) > 2. In the subsequent rounds, we migrate only the pages that were > modified since the previous round started > 3. In the final round, we suspend the guest, migrate the remaining > pages that were modified from the previous round and the guest's > internal state (vCPU, emulated and virtualized devices). > > To detect the pages that were modified between rounds, we propose an > additional dirty bit (virtualization dirty bit) for each memory page. > This bit would be set every time the page's dirty bit is set. > However, > this virtualization dirty bit is reset only when the page is > migrated. > > The proposed implementation is split in two parts: > 1. The first one, the warm migration, is just a wrapper on the > suspend/resume feature which, instead of saving the suspended state > on > disk, sends it via the network to the destination > 2. The second part, the live migration, uses the layer previously > presented, but sends the guest's memory in rounds, as described > above. > > The migration process works as follows: > 1. we identify: > Â - VM_NAME - the name of the virtual machine which will be migrated > Â - SRC_IP - the IP address of the source host > Â - DST_IP - the IP address of the destination host (default is 24983) > Â - DST_PORT - the port we want to use for migration > 2. we start a virtual machine on the destination host that will wait > for a migration. Here, we must specify SRC_IP (and the port we want > to > open for migration, default is 24983). > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst > 3. using bhyvectl on the source host, we start the migration process. > e.g.: bhyvectl --migrate=DST_IP:24983 --vm=guest_vm > > A full tutorial on this can be found here: > https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration-using-bhyve > > For sending the migration request to a virtual machine, we use the > same thread/socket that is used for suspend. > For receiving a migration request, we used a similar approach to the > resume process. > > As some of you may remember seeing similar emails from our part on > the > freebsd-virtualization list, I'll present a brief history of this > project: > The first part of the project was the suspend/resume implementation > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard > (https://reviews.freebsd.org/D19495). > After that, we focused on two tracks: > 1. adding various suspend/resume features (multiple device support - > https://reviews.freebsd.org/D26387, CAPSICUM support - > https://reviews.freebsd.org/D30471, having an uniform file format - > at > that time, during the bhyve bi-weekly calls, we concluded that the > JSON format was the most suitable at that time - > https://reviews.freebsd.org/D29262) so we can remove the #ifdef > BHYVE_SNAPSHOT guard. > 2. implementing the migration feature for bhyve. Since this one > relies > on the save/restore, but does not modify its behaviour, we considered > we can go in parallel with both tracks. > We had various presentations in the FreeBSD Community on these > topics: > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, > AsiaBSDCon2023. > > The first patches for warm and live migration were opened in 2021: > https://reviews.freebsd.org/D28270, > https://reviews.freebsd.org/D30954. However, the general feedback on > these was that the patches are too big to be reviewed, so we should > split them in smaller chunks (this was also true for some of the > suspend/resume improvements). Thus, we split them into smaller parts. > Also, as things changed in bhyve (i.e., capsicum support for > suspend/resume was added this year), we rebased and updated our > reviews. > > Thank you, > Elena >