From nobody Mon Jul 17 17:08:27 2023 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R4T7K72fbz4nD7V for ; Mon, 17 Jul 2023 17:08:29 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Received: from mail-oo1-xc2c.google.com (mail-oo1-xc2c.google.com [IPv6:2607:f8b0:4864:20::c2c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R4T7K5C9Pz4QLF; Mon, 17 Jul 2023 17:08:29 +0000 (UTC) (envelope-from rob.fx907@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oo1-xc2c.google.com with SMTP id 006d021491bc7-5671db01ee0so377154eaf.1; Mon, 17 Jul 2023 10:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689613709; x=1690218509; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=txDw3xnLH98/eBGl4m9nbROBGF0zxcs4iz9aVGqtzX4=; b=RXIGEyqT1smCGyIWIcUMH32nGpi4QpcBUzZY5szYnSAuP59hoYXrVdbQZe6+WnWORy /lTgLC0fCPE/Rp8zc+WreymmeLk0Dx05pb5U7N3kIQ8bj9NkBfGoBAfNnH8C4gy+4+bm V4FE4d02Xh3JPbjpDdXB1cwx4/E8LfijpC7QenIxb613gzkzzVCLVOS/RAVoGCEaR5zA a5dR5hOX3gTgnELUPJUO1xJ550GTXHgNpTx6MKdkr5BUcDidSCjk3SKkVrHOHK/8YcK5 Av6YCuOEEe+ivFS0b8TvtkEuDE+SU+9Rc+tpCAM8ImPZzcpr6O4baOKmCX7TnJFUaTuO 41ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689613709; x=1690218509; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=txDw3xnLH98/eBGl4m9nbROBGF0zxcs4iz9aVGqtzX4=; b=aby7OnhcNR2UkZvVvIscIAXpn2nCSEFaNvXGpjqSmTTd0TupeTIaSIdrPGphLuC7Y1 UNd547n44sf1H6/NJPcIBhhF6mxZadGQ5ZuB5JyjPazNrx4J+rbS2XJGkzapz2JZC2fn n5vKFncYc/TZKovM9878IuYcXBg/Hz73dujc8ffv6+4QJxRUHlKRz7L1vAY7C+AUwQnu M1JVnhAWdLc3WrygfGI4cJrBY+CPbUv+ZX6OrgK89W3UxKZFdBnt0GKSjRCLGVzHaVVe Sb+Czsq8oj/d5ycZpHPvoefn+lxrckRN3H80ZFFQh3DvmGVrqxR7ZTL2FCXximVHzqXj RJCA== X-Gm-Message-State: ABy/qLaIWz2inuhiX6dukOmdbv5Nu97UH2eHDwJa4yXOV1POynUgixkN GkSCfJ7O13zOm2XQDZgJqVSRnswLSeM2IFgHRwo= X-Google-Smtp-Source: APBJJlGM/m8UMjiNSWjMrjgc0rDIPMgh7/nxwVb39Lb3e8SsA/5KAf1aERnuaa/Pp0+jBH2abw6S52HX8DUy+IejhSE= X-Received: by 2002:a05:6359:b96:b0:133:b86:cbe8 with SMTP id gf22-20020a0563590b9600b001330b86cbe8mr1547095rwb.1.1689613708546; Mon, 17 Jul 2023 10:08:28 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 Received: by 2002:a05:7000:b1c1:b0:4f3:bac5:bc52 with HTTP; Mon, 17 Jul 2023 10:08:27 -0700 (PDT) In-Reply-To: References: <3d7ee1f6ff98fe9aede5a85702b906fc3014b6b6.camel@FreeBSD.org> From: Rob Wing Date: Mon, 17 Jul 2023 09:08:27 -0800 Message-ID: Subject: Re: Warm and Live Migration Implementation for bhyve To: Elena Mihailescu Cc: =?UTF-8?Q?Corvin_K=C3=B6hne?= , "freebsd-virtualization@freebsd.org" , Mihai Carabas , Matthew Grooms Content-Type: multipart/alternative; boundary="000000000000e7d7220600b1d87d" X-Rspamd-Queue-Id: 4R4T7K5C9Pz4QLF X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TAGGED_RCPT(0.00)[]; TAGGED_FROM(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --000000000000e7d7220600b1d87d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm curious why the stream send bits are rolled into bhyve as opposed to using netcat/ssh to do the network transfer? sort of how one would do a zfs send/recv between hosts On Monday, July 17, 2023, Elena Mihailescu wrote: > Hi Corvin, > > On Mon, 3 Jul 2023 at 09:35, Corvin K=C3=B6hne wrot= e: > > > > On Tue, 2023-06-27 at 16:35 +0300, Elena Mihailescu wrote: > > > Hi Corvin, > > > > > > Thank you for the questions! I'll respond to them inline. > > > > > > On Mon, 26 Jun 2023 at 10:16, Corvin K=C3=B6hne > > > wrote: > > > > > > > > Hi Elena, > > > > > > > > thanks for posting this proposal here. > > > > > > > > Some open questions from my side: > > > > > > > > 1. How is the data send to the target? Does the host send a > > > > complete > > > > dump and the target parses it? Or does the target request data one > > > > by > > > > one und the host sends it as response? > > > > > > > It's not a dump of the guest's state, it's transmitted in steps. > > > However, some parts may be migrated as a chunk (e.g., the emulated > > > devices' state is transmitted as the buffer generated from the > > > snapshot functions). > > > > > > > How does the receiver know which chunk relates to which device? It > > would be nice if you can start bhyve on the receiver side without > > parameters e.g. `bhyve --receive=3D127.0.0.1:1234`. Therefore, the > > protocol has to carry some information about the device configuration. > > > > Regarding your first question, we send a chunk of data (a buffer) with > the state: we resume the data in the same order we saved it. It relies > on save/restore. We currently do not support migrating between > different versions of suspend&resume/migration. > > It would be nice to have something like `bhyve > --receive=3D127.0.0.1:1234`, but I don't think it is possible at this > point mainly because of the following two reasons: > - the guest image must be shared (e.g., via NFS) between the source > and destination hosts. If the mounting points differ between the two, > opening the disk at the destination will fail (also, we must suppose > that the user used an absolute path since a relative one won't work) > - if the VM uses a network adapter, we must specify the tap interface > on the destination host (e.g., if on the source host the VM uses > `tap0`, on the destination host, `tap0` may not exist or may be used > by other VMs). > > > > > > > I'll try to describe a bit the protocol we have implemented for > > > migration, maybe it can partially respond to the second and third > > > questions. > > > > > > The destination host waits for the source host to connect (through a > > > socket). > > > After that, the source sends its system specifications (hw_machine, > > > hw_model, hw_pagesize). If the source and destination hosts have > > > identical hardware configurations, the migration can take place. > > > > > > Then, if we have live migration, we migrate the memory in rounds > > > (i.e., we get a list of the pages that have the dirty bit set, send > > > it > > > to the destination to know what pages will be received, then send the > > > pages through the socket; this process is repeated until the last > > > round). > > > > > > Next, we stop the guest's vcpus, send the remaining memory (for live > > > migration) or the guest's memory from vmctx->baseaddr for warm > > > migration. Then, based on the suspend/resume feature, we get the > > > state > > > of the virtualized devices (the ones from the kernel space) and send > > > this buffer to the destination. We repeat this for the emulated > > > devices as well (the ones from the userspace). > > > > > > On the receiver host, we get the memory pages and set them to their > > > according position in the guest's memory, use the restore functions > > > for the state of the devices and start the guest's execution. > > > > > > Excluding the guest's memory transfer, the rest is based on the > > > suspend/resume feature. We snapshot the guest's state, but instead of > > > saving the data locally, we send it via network to the destination. > > > On > > > the destination host, we start a new virtual machine, but instead of > > > reading/getting the state from the disk (i.e., the snapshot files) we > > > get this state via the network from the source host. > > > > > > If the destination can properly resume the guest activity, it will > > > send an "OK" to the source host so it can destroy/remove the guest > > > from its end. > > > > > > Both warm and live migration are based on "cold migration". Cold > > > migration means we suspend the guest on the source host, and restore > > > the guest on the destination host from the snapshot files. Warm > > > migration only does this using a socket, while live migration changes > > > the way the memory is migrated. > > > > > > > 2. What happens if we add a new data section? > > > > > > > What are you referring to with a new data section? Is this question > > > related to the third one? If so, see my answer below. > > > > > > > 3. What happens if the bhyve version differs on host and target > > > > machine? > > > > > > The two hosts must be identical for migration, that's why we have the > > > part where we check the specifications between the two migration > > > hosts. They are expected to have the same version of bhyve and > > > FreeBSD. We will add an additional check in the check specs part to > > > see if we have the same FreeBSD build. > > > > > > As long as the changes in the virtual memory subsystem won't affect > > > bhyve (and how the virtual machine sees/uses the memory), the > > > migration constraints should only be related to suspend/resume. The > > > state of the virtual devices is handled by the snapshot system, so if > > > it is able to accommodate changes in the data structures, the > > > migration process will not be affected. > > > > > > Thank you, > > > Elena > > > > > > > > > > > > > > > -- > > > > Kind regards, > > > > Corvin > > > > > > > > On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: > > > > > Hello, > > > > > > > > > > This mail presents the migration feature we have implemented for > > > > > bhyve. Any feedback from the community is much appreciated. > > > > > > > > > > We have opened a stack of reviews on Phabricator > > > > > (https://reviews.freebsd.org/D34717) that is meant to split the > > > > > code > > > > > in smaller parts so it can be more easily reviewed. A brief > > > > > history > > > > > of > > > > > the implementation can be found at the bottom of this email. > > > > > > > > > > The migration mechanism we propose needs two main components in > > > > > order > > > > > to move a virtual machine from one host to another: > > > > > 1. the guest's state (vCPUs, emulated and virtualized devices) > > > > > 2. the guest's memory > > > > > > > > > > For the first part, we rely on the suspend/resume feature. We > > > > > call > > > > > the > > > > > same functions as the ones used by suspend/resume, but instead of > > > > > saving the data in files, we send it via the network. > > > > > > > > > > The most time consuming aspect of migration is transmitting guest > > > > > memory. The UPB team has implemented two options to accomplish > > > > > this: > > > > > 1. Warm Migration: The guest execution is suspended on the source > > > > > host > > > > > while the memory is sent to the destination host. This method is > > > > > less > > > > > complex but may cause extended downtime. > > > > > 2. Live Migration: The guest continues to execute on the source > > > > > host > > > > > while the memory is transmitted to the destination host. This > > > > > method > > > > > is more complex but offers reduced downtime. > > > > > > > > > > The proposed live migration procedure (pre-copy live migration) > > > > > migrates the memory in rounds: > > > > > 1. In the initial round, we migrate all the guest memory (all > > > > > pages > > > > > that are allocated) > > > > > 2. In the subsequent rounds, we migrate only the pages that were > > > > > modified since the previous round started > > > > > 3. In the final round, we suspend the guest, migrate the > > > > > remaining > > > > > pages that were modified from the previous round and the guest's > > > > > internal state (vCPU, emulated and virtualized devices). > > > > > > > > > > To detect the pages that were modified between rounds, we propose > > > > > an > > > > > additional dirty bit (virtualization dirty bit) for each memory > > > > > page. > > > > > This bit would be set every time the page's dirty bit is set. > > > > > However, > > > > > this virtualization dirty bit is reset only when the page is > > > > > migrated. > > > > > > > > > > The proposed implementation is split in two parts: > > > > > 1. The first one, the warm migration, is just a wrapper on the > > > > > suspend/resume feature which, instead of saving the suspended > > > > > state > > > > > on > > > > > disk, sends it via the network to the destination > > > > > 2. The second part, the live migration, uses the layer previously > > > > > presented, but sends the guest's memory in rounds, as described > > > > > above. > > > > > > > > > > The migration process works as follows: > > > > > 1. we identify: > > > > > - VM_NAME - the name of the virtual machine which will be > > > > > migrated > > > > > - SRC_IP - the IP address of the source host > > > > > - DST_IP - the IP address of the destination host (default is > > > > > 24983) > > > > > - DST_PORT - the port we want to use for migration > > > > > 2. we start a virtual machine on the destination host that will > > > > > wait > > > > > for a migration. Here, we must specify SRC_IP (and the port we > > > > > want > > > > > to > > > > > open for migration, default is 24983). > > > > > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst > > > > > 3. using bhyvectl on the source host, we start the migration > > > > > process. > > > > > e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm > > > > > > > > > > A full tutorial on this can be found here: > > > > > https://github.com/FreeBSD-UPB/freebsd-src/wiki/Virtual- > Machine-Migration-using-bhyve > > > > > > > > > > For sending the migration request to a virtual machine, we use > > > > > the > > > > > same thread/socket that is used for suspend. > > > > > For receiving a migration request, we used a similar approach to > > > > > the > > > > > resume process. > > > > > > > > > > As some of you may remember seeing similar emails from our part > > > > > on > > > > > the > > > > > freebsd-virtualization list, I'll present a brief history of this > > > > > project: > > > > > The first part of the project was the suspend/resume > > > > > implementation > > > > > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT guard > > > > > (https://reviews.freebsd.org/D19495). > > > > > After that, we focused on two tracks: > > > > > 1. adding various suspend/resume features (multiple device > > > > > support - > > > > > https://reviews.freebsd.org/D26387, CAPSICUM support - > > > > > https://reviews.freebsd.org/D30471, having an uniform file format > > > > > - > > > > > at > > > > > that time, during the bhyve bi-weekly calls, we concluded that > > > > > the > > > > > JSON format was the most suitable at that time - > > > > > https://reviews.freebsd.org/D29262) so we can remove the #ifdef > > > > > BHYVE_SNAPSHOT guard. > > > > > 2. implementing the migration feature for bhyve. Since this one > > > > > relies > > > > > on the save/restore, but does not modify its behaviour, we > > > > > considered > > > > > we can go in parallel with both tracks. > > > > > We had various presentations in the FreeBSD Community on these > > > > > topics: > > > > > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020, > > > > > AsiaBSDCon2023. > > > > > > > > > > The first patches for warm and live migration were opened in > > > > > 2021: > > > > > https://reviews.freebsd.org/D28270, > > > > > https://reviews.freebsd.org/D30954. However, the general feedback > > > > > on > > > > > these was that the patches are too big to be reviewed, so we > > > > > should > > > > > split them in smaller chunks (this was also true for some of the > > > > > suspend/resume improvements). Thus, we split them into smaller > > > > > parts. > > > > > Also, as things changed in bhyve (i.e., capsicum support for > > > > > suspend/resume was added this year), we rebased and updated our > > > > > reviews. > > > > > > > > > > Thank you, > > > > > Elena > > > > > > > > > > > > > -- > > Kind regards, > > Corvin > > Thanks, > Elena > > --000000000000e7d7220600b1d87d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm curious why the stream send bits are rolled into bhyve as opposed t= o using netcat/ssh to do the network transfer?

sort of h= ow one would do a zfs send/recv between hosts

On Monday, July 17, 20= 23, Elena Mihailescu <ele= namihailescu22@gmail.com> wrote:
H= i Corvin,

On Mon, 3 Jul 2023 at 09:35, Corvin K=C3=B6hne <corvink@freebsd.org> wrote:
>
> On Tue, 2023-06-27 at 16:35 +0300, Elena Mihailescu wrote:
> > Hi Corvin,
> >
> > Thank you for the questions! I'll respond to them inline.
> >
> > On Mon, 26 Jun 2023 at 10:16, Corvin K=C3=B6hne <corvink@freebsd.org>
> > wrote:
> > >
> > > Hi Elena,
> > >
> > > thanks for posting this proposal here.
> > >
> > > Some open questions from my side:
> > >
> > > 1. How is the data send to the target? Does the host send a<= br> > > > complete
> > > dump and the target parses it? Or does the target request da= ta one
> > > by
> > > one und the host sends it as response?
> > >
> > It's not a dump of the guest's state, it's transmitte= d in steps.
> > However, some parts may be migrated as a chunk (e.g., the emulate= d
> > devices' state is transmitted as the buffer generated from th= e
> > snapshot functions).
> >
>
> How does the receiver know which chunk relates to which device? It
> would be nice if you can start bhyve on the receiver side without
> parameters e.g. `bhyve --receive=3D127.0.0.1:1234`. Therefore, the
> protocol has to carry some information about the device configuration.=
>

Regarding your first question, we send a chunk of data (a buffer) with
the state: we resume the data in the same order we saved it. It relies
on save/restore. We currently do not support migrating between
different versions of suspend&resume/migration.

It would be nice to have something like `bhyve
--receive=3D127.0.0.1:1234`, but I don't think it is possible at this point mainly because of the following two reasons:
- the guest image must be shared (e.g., via NFS) between the source
and destination hosts. If the mounting points differ between the two,
opening the disk at the destination will fail (also, we must suppose
that the user used an absolute path since a relative one won't work) - if the VM uses a network adapter, we must specify the tap interface
on the destination host (e.g., if on the source host the VM uses
`tap0`, on the destination host, `tap0` may not exist or may be used
by other VMs).


>
> > I'll try to describe a bit the protocol we have implemented f= or
> > migration, maybe it can partially respond to the second and third=
> > questions.
> >
> > The destination host waits for the source host to connect (throug= h a
> > socket).
> > After that, the source sends its system specifications (hw_machin= e,
> > hw_model, hw_pagesize). If the source and destination hosts have<= br> > > identical hardware configurations, the migration can take place.<= br> > >
> > Then, if we have live migration, we migrate the memory in rounds<= br> > > (i.e., we get a list of the pages that have the dirty bit set, se= nd
> > it
> > to the destination to know what pages will be received, then send= the
> > pages through the socket; this process is repeated until the last=
> > round).
> >
> > Next, we stop the guest's vcpus, send the remaining memory (f= or live
> > migration) or the guest's memory from vmctx->baseaddr for = warm
> > migration. Then, based on the suspend/resume feature, we get the<= br> > > state
> > of the virtualized devices (the ones from the kernel space) and s= end
> > this buffer to the destination. We repeat this for the emulated > > devices as well (the ones from the userspace).
> >
> > On the receiver host, we get the memory pages and set them to the= ir
> > according position in the guest's memory, use the restore fun= ctions
> > for the state of the devices and start the guest's execution.=
> >
> > Excluding the guest's memory transfer, the rest is based on t= he
> > suspend/resume feature. We snapshot the guest's state, but in= stead of
> > saving the data locally, we send it via network to the destinatio= n.
> > On
> > the destination host, we start a new virtual machine, but instead= of
> > reading/getting the state from the disk (i.e., the snapshot files= ) we
> > get this state via the network from the source host.
> >
> > If the destination can properly resume the guest activity, it wil= l
> > send an "OK" to the source host so it can destroy/remov= e the guest
> > from its end.
> >
> > Both warm and live migration are based on "cold migration&qu= ot;. Cold
> > migration means we suspend the guest on the source host, and rest= ore
> > the guest on the destination host from the snapshot files. Warm > > migration only does this using a socket, while live migration cha= nges
> > the way the memory is migrated.
> >
> > > 2. What happens if we add a new data section?
> > >
> > What are you referring to with a new data section? Is this questi= on
> > related to the third one? If so, see my answer below.
> >
> > > 3. What happens if the bhyve version differs on host and tar= get
> > > machine?
> >
> > The two hosts must be identical for migration, that's why we = have the
> > part where we check the specifications between the two migration<= br> > > hosts. They are expected to have the same version of bhyve and > > FreeBSD. We will add an additional check in the check specs part = to
> > see if we have the same FreeBSD build.
> >
> > As long as the changes in the virtual memory subsystem won't = affect
> > bhyve (and how the virtual machine sees/uses the memory), the
> > migration constraints should only be related to suspend/resume. T= he
> > state of the virtual devices is handled by the snapshot system, s= o if
> > it is able to accommodate changes in the data structures, the
> > migration process will not be affected.
> >
> > Thank you,
> > Elena
> >
> > >
> > >
> > > --
> > > Kind regards,
> > > Corvin
> > >
> > > On Fri, 2023-06-23 at 13:00 +0300, Elena Mihailescu wrote: > > > > Hello,
> > > >
> > > > This mail presents the migration feature we have implem= ented for
> > > > bhyve. Any feedback from the community is much apprecia= ted.
> > > >
> > > > We have opened a stack of reviews on Phabricator
> > > > (https://reviews.freebsd.org/D34717) that is meant to s= plit the
> > > > code
> > > > in smaller parts so it can be more easily reviewed. A b= rief
> > > > history
> > > > of
> > > > the implementation can be found at the bottom of this e= mail.
> > > >
> > > > The migration mechanism we propose needs two main compo= nents in
> > > > order
> > > > to move a virtual machine from one host to another:
> > > > 1. the guest's state (vCPUs, emulated and virtualiz= ed devices)
> > > > 2. the guest's memory
> > > >
> > > > For the first part, we rely on the suspend/resume featu= re. We
> > > > call
> > > > the
> > > > same functions as the ones used by suspend/resume, but = instead of
> > > > saving the data in files, we send it via the network. > > > >
> > > > The most time consuming aspect of migration is transmit= ting guest
> > > > memory. The UPB team has implemented two options to acc= omplish
> > > > this:
> > > > 1. Warm Migration: The guest execution is suspended on = the source
> > > > host
> > > > while the memory is sent to the destination host. This = method is
> > > > less
> > > > complex but may cause extended downtime.
> > > > 2. Live Migration: The guest continues to execute on th= e source
> > > > host
> > > > while the memory is transmitted to the destination host= . This
> > > > method
> > > > is more complex but offers reduced downtime.
> > > >
> > > > The proposed live migration procedure (pre-copy live mi= gration)
> > > > migrates the memory in rounds:
> > > > 1. In the initial round, we migrate all the guest memor= y (all
> > > > pages
> > > > that are allocated)
> > > > 2. In the subsequent rounds, we migrate only the pages = that were
> > > > modified since the previous round started
> > > > 3. In the final round, we suspend the guest, migrate th= e
> > > > remaining
> > > > pages that were modified from the previous round and th= e guest's
> > > > internal state (vCPU, emulated and virtualized devices)= .
> > > >
> > > > To detect the pages that were modified between rounds, = we propose
> > > > an
> > > > additional dirty bit (virtualization dirty bit) for eac= h memory
> > > > page.
> > > > This bit would be set every time the page's dirty b= it is set.
> > > > However,
> > > > this virtualization dirty bit is reset only when the pa= ge is
> > > > migrated.
> > > >
> > > > The proposed implementation is split in two parts:
> > > > 1. The first one, the warm migration, is just a wrapper= on the
> > > > suspend/resume feature which, instead of saving the sus= pended
> > > > state
> > > > on
> > > > disk, sends it via the network to the destination
> > > > 2. The second part, the live migration, uses the layer = previously
> > > > presented, but sends the guest's memory in rounds, = as described
> > > > above.
> > > >
> > > > The migration process works as follows:
> > > > 1. we identify:
> > > >=C2=A0 - VM_NAME - the name of the virtual machine which= will be
> > > > migrated
> > > >=C2=A0 - SRC_IP - the IP address of the source host
> > > >=C2=A0 - DST_IP - the IP address of the destination host= (default is
> > > > 24983)
> > > >=C2=A0 - DST_PORT - the port we want to use for migratio= n
> > > > 2. we start a virtual machine on the destination host t= hat will
> > > > wait
> > > > for a migration. Here, we must specify SRC_IP (and the = port we
> > > > want
> > > > to
> > > > open for migration, default is 24983).
> > > > e.g.: bhyve ... -R SRC_IP:24983 guest_vm_dst
> > > > 3. using bhyvectl on the source host, we start the migr= ation
> > > > process.
> > > > e.g.: bhyvectl --migrate=3DDST_IP:24983 --vm=3Dguest_vm=
> > > >
> > > > A full tutorial on this can be found here:
> > > > https://github= .com/FreeBSD-UPB/freebsd-src/wiki/Virtual-Machine-Migration-using= -bhyve
> > > >
> > > > For sending the migration request to a virtual machine,= we use
> > > > the
> > > > same thread/socket that is used for suspend.
> > > > For receiving a migration request, we used a similar ap= proach to
> > > > the
> > > > resume process.
> > > >
> > > > As some of you may remember seeing similar emails from = our part
> > > > on
> > > > the
> > > > freebsd-virtualization list, I'll present a brief h= istory of this
> > > > project:
> > > > The first part of the project was the suspend/resume > > > > implementation
> > > > which landed in bhyve in 2020, under the BHYVE_SNAPSHOT= guard
> > > > (https://reviews.freebsd.org/D19495).
> > > > After that, we focused on two tracks:
> > > > 1. adding various suspend/resume features (multiple dev= ice
> > > > support -
> > > > https://reviews.freebsd.org/D26387, CAPSICUM support -=
> > > > https://reviews.freebsd.org/D30471, having an uniform = file format
> > > > -
> > > > at
> > > > that time, during the bhyve bi-weekly calls, we conclud= ed that
> > > > the
> > > > JSON format was the most suitable at that time -
> > > > https://reviews.freebsd.org/D29262) so we can remove t= he #ifdef
> > > > BHYVE_SNAPSHOT guard.
> > > > 2. implementing the migration feature for bhyve. Since = this one
> > > > relies
> > > > on the save/restore, but does not modify its behaviour,= we
> > > > considered
> > > > we can go in parallel with both tracks.
> > > > We had various presentations in the FreeBSD Community o= n these
> > > > topics:
> > > > AsiaBSDCon2018, AsiaBSDCon2019, BSDCan2019, BSDCan2020,=
> > > > AsiaBSDCon2023.
> > > >
> > > > The first patches for warm and live migration were open= ed in
> > > > 2021:
> > > > https://reviews.freebsd.org/D28270,
> > > > https://reviews.freebsd.org/D30954. However, the gener= al feedback
> > > > on
> > > > these was that the patches are too big to be reviewed, = so we
> > > > should
> > > > split them in smaller chunks (this was also true for so= me of the
> > > > suspend/resume improvements). Thus, we split them into = smaller
> > > > parts.
> > > > Also, as things changed in bhyve (i.e., capsicum suppor= t for
> > > > suspend/resume was added this year), we rebased and upd= ated our
> > > > reviews.
> > > >
> > > > Thank you,
> > > > Elena
> > > >
> > >
>
> --
> Kind regards,
> Corvin

Thanks,
Elena

--000000000000e7d7220600b1d87d--