Warm and Live Migration Implementation for bhyve

From: Elena Mihailescu <elenamihailescu22_at_gmail.com>
Date: Fri, 23 Jun 2023 10:00:32 UTC
Hello,

This mail presents the migration feature we have implemented for
bhyve. Any feedback from the community is much appreciated.

We have opened a stack of reviews on Phabricator
(https://reviews.freebsd.org/D34717) that is meant to split the code
in smaller parts so it can be more easily reviewed. A brief history of
the implementation can be found at the bottom of this email.

The migration mechanism we propose needs two main components in order
to move a virtual machine from one host to another:
1. the guest's state (vCPUs, emulated and virtualized devices)
2. the guest's memory

For the first part, we rely on the suspend/resume feature. We call the
same functions as the ones used by suspend/resume, but instead of
saving the data in files, we send it via the network.

The most time consuming aspect of migration is transmitting guest
memory. The UPB team has implemented two options to accomplish this:
1. Warm Migration: The guest execution is suspended on the source host
while the memory is sent to the destination host. This method is less
complex but may cause extended downtime.
2. Live Migration: The guest continues to execute on the source host
while the memory is transmitted to the destination host. This method
is more complex but offers reduced downtime.

The proposed live migration procedure (pre-copy live migration)
migrates the memory in rounds:
1. In the initial round, we migrate all the guest memory (all pages
that are allocated)
2. In the subsequent rounds, we migrate only the pages that were
modified since the previous round started
3. In the final round, we suspend the guest, migrate the remaining
pages that were modified from the previous round and the guest's
internal state (vCPU, emulated and virtualized devices).

To detect the pages that were modified between rounds, we propose an
additional dirty bit (virtualization dirty bit) for each memory page.
This bit would be set every time the page's dirty bit is set. However,
this virtualization dirty bit is reset only when the page is migrated.

The proposed implementation is split in two parts:
1. The first one, the warm migration, is just a wrapper on the
suspend/resume feature which, instead of saving the suspended state on
disk, sends it via the network to the destination
2. The second part, the live migration, uses the layer previously
presented, but sends the guest's memory in rounds, as described above.

The migration process works as follows:
1. we identify:
 - VM_NAME - the name of the virtual machine which will be migrated
 - SRC_IP - the IP address of the source host
 - DST_IP - the IP address of the destination host (default is 24983)
 - DST_PORT - the port we want to use for migration
2. we start a virtual machine on the destination host that will wait
for a migration. Here, we must specify SRC_IP (and the port we want to
open for migration, default is 24983).
e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst
3. using bhyvectl on the source host, we start the migration process.
e.g.: bhyvectl --migrate=DST_IP:24983 --vm=guest_vm

A full tutorial on this can be found here:
https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration-using-bhyve

For sending the migration request to a virtual machine, we use the
same thread/socket that is used for suspend.
For receiving a migration request, we used a similar approach to the
resume process.

As some of you may remember seeing similar emails from our part on the
freebsd-virtualization list, I'll present a brief history of this
project:
The first part of the project was the suspend/resume implementation
which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard
(https://reviews.freebsd.org/D19495).
After that, we focused on two tracks:
1. adding various suspend/resume features (multiple device support -
https://reviews.freebsd.org/D26387, CAPSICUM support -
https://reviews.freebsd.org/D30471, having an uniform file format - at
that time, during the bhyve bi-weekly calls, we concluded that the
JSON format was the most suitable at that time -
https://reviews.freebsd.org/D29262) so we can remove the #ifdef
BHYVE_SNAPSHOT guard.
2. implementing the migration feature for bhyve. Since this one relies
on the save/restore, but does not modify its behaviour, we considered
we can go in parallel with both tracks.
We had various presentations in the FreeBSD Community on these topics:
AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020,
AsiaBSDCon2023.

The first patches for warm and live migration were opened in 2021:
https://reviews.freebsd.org/D28270,
https://reviews.freebsd.org/D30954. However, the general feedback on
these was that the patches are too big to be reviewed, so we should
split them in smaller chunks (this was also true for some of the
suspend/resume improvements). Thus, we split them into smaller parts.
Also, as things changed in bhyve (i.e., capsicum support for
suspend/resume was added this year), we rebased and updated our
reviews.

Thank you,
Elena