Warm and Live Migration Implementation for bhyve
- Reply: Corvin Köhne : "Re: Warm and Live Migration Implementation for bhyve"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 23 Jun 2023 10:00:32 UTC
Hello, This mail presents the migration feature we have implemented for bhyve. Any feedback from the community is much appreciated. We have opened a stack of reviews on Phabricator (https://reviews.freebsd.org/D34717) that is meant to split the code in smaller parts so it can be more easily reviewed. A brief history of the implementation can be found at the bottom of this email. The migration mechanism we propose needs two main components in order to move a virtual machine from one host to another: 1. the guest's state (vCPUs, emulated and virtualized devices) 2. the guest's memory For the first part, we rely on the suspend/resume feature. We call the same functions as the ones used by suspend/resume, but instead of saving the data in files, we send it via the network. The most time consuming aspect of migration is transmitting guest memory. The UPB team has implemented two options to accomplish this: 1. Warm Migration: The guest execution is suspended on the source host while the memory is sent to the destination host. This method is less complex but may cause extended downtime. 2. Live Migration: The guest continues to execute on the source host while the memory is transmitted to the destination host. This method is more complex but offers reduced downtime. The proposed live migration procedure (pre-copy live migration) migrates the memory in rounds: 1. In the initial round, we migrate all the guest memory (all pages that are allocated) 2. In the subsequent rounds, we migrate only the pages that were modified since the previous round started 3. In the final round, we suspend the guest, migrate the remaining pages that were modified from the previous round and the guest's internal state (vCPU, emulated and virtualized devices). To detect the pages that were modified between rounds, we propose an additional dirty bit (virtualization dirty bit) for each memory page. This bit would be set every time the page's dirty bit is set. However, this virtualization dirty bit is reset only when the page is migrated. The proposed implementation is split in two parts: 1. The first one, the warm migration, is just a wrapper on the suspend/resume feature which, instead of saving the suspended state on disk, sends it via the network to the destination 2. The second part, the live migration, uses the layer previously presented, but sends the guest's memory in rounds, as described above. The migration process works as follows: 1. we identify: - VM_NAME - the name of the virtual machine which will be migrated - SRC_IP - the IP address of the source host - DST_IP - the IP address of the destination host (default is 24983) - DST_PORT - the port we want to use for migration 2. we start a virtual machine on the destination host that will wait for a migration. Here, we must specify SRC_IP (and the port we want to open for migration, default is 24983). e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst 3. using bhyvectl on the source host, we start the migration process. e.g.: bhyvectl --migrate=DST_IP:24983 --vm=guest_vm A full tutorial on this can be found here: https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration-using-bhyve For sending the migration request to a virtual machine, we use the same thread/socket that is used for suspend. For receiving a migration request, we used a similar approach to the resume process. As some of you may remember seeing similar emails from our part on the freebsd-virtualization list, I'll present a brief history of this project: The first part of the project was the suspend/resume implementation which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard (https://reviews.freebsd.org/D19495). After that, we focused on two tracks: 1. adding various suspend/resume features (multiple device support - https://reviews.freebsd.org/D26387, CAPSICUM support - https://reviews.freebsd.org/D30471, having an uniform file format - at that time, during the bhyve bi-weekly calls, we concluded that the JSON format was the most suitable at that time - https://reviews.freebsd.org/D29262) so we can remove the #ifdef BHYVE_SNAPSHOT guard. 2. implementing the migration feature for bhyve. Since this one relies on the save/restore, but does not modify its behaviour, we considered we can go in parallel with both tracks. We had various presentations in the FreeBSD Community on these topics: AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, AsiaBSDCon2023. The first patches for warm and live migration were opened in 2021: https://reviews.freebsd.org/D28270, https://reviews.freebsd.org/D30954. However, the general feedback on these was that the patches are too big to be reviewed, so we should split them in smaller chunks (this was also true for some of the suspend/resume improvements). Thus, we split them into smaller parts. Also, as things changed in bhyve (i.e., capsicum support for suspend/resume was added this year), we rebased and updated our reviews. Thank you, Elena