From nobody Tue Jun 06 11:25:38 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Qb7Sx1vFJz4bD4K; Tue, 6 Jun 2023 11:25:53 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Qb7Sw3SKXz4NyC; Tue, 6 Jun 2023 11:25:52 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-4f620583bc2so3609835e87.1; Tue, 06 Jun 2023 04:25:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686050750; x=1688642750; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=S2earCf2oH9ASFpdicga292JWYO11HZaj/LPmJqHUuA=; b=Y8vaMWlxjye+0P23e8NX9LRfYL5iDZdwqlWM4Rs6Iius/lE8BEEhnloAoUDudu7zJ2 RcLGnNgVqoWWkWAX4p53qBMUIxlOvr8Zv+C+//ETqXs+zSNQZg0JyNU2Z+aRryyT7K4O xYiimRvNDrqhANnrk0Y9fWYfoHaSYZpLWwU2JJ3yShKJcQfalxqugWEhgajFLuSu7Lbi Ek1dxCqTlQNkfJOO/6/SwchevtFzcGul89Fw0EkCF4rBAi2/3auS0LmA5pq63NFVdnhy UPC2RTv47ZamiunLVZ3dj70EstLrAmjRag/SL/MZSufM/jNmpQK9y+XLCWpizD1zWZuu ldRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686050750; x=1688642750; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=S2earCf2oH9ASFpdicga292JWYO11HZaj/LPmJqHUuA=; b=O44EuG8mGCn0Rd/qaZyrp98zR+41Zt6APs6eWredB3A3ScBOMlihimO5AYx39Sno7y WoixrPhQohJOPZ7YBuxANmN8BjdyL3fFt/hGr9Pub2EtGwvvn2nIpeQSCZYFDIFCDW4z 1l7QLv6PREpPasx0DfVaGWR9eS58lJPmEFCu9/QJOwYwQ0HnyXCuKUh1OrIqNhCoXa5L tO2gx33pFqsovxnCCq/1e1afGvK2IFNMtOVSogg8heUWuB0nat7S8UjntTMAWqW09UQk 0dg/yo/q/PVQpTnAJosCNt57LzfAI6cWuUm6rc3+yY9iC4TGUtEtnKp7vIKpxU439gq2 h+pA== X-Gm-Message-State: AC+VfDyMZhVQDOX6Jp/pR3iXc9txnFikkKcYp73saHYV23DJsW4GDdsJ KreFcw4S9P2rPKpHztqHnvX+nl6iKADawA== X-Google-Smtp-Source: ACHHUZ7pmsR6XuIv6HuZQ11l5JE+B/PVTLBH+dgZMdOnDENJSK2FS+MrpDT9E/A60mrTOdZ+MkW16Q== X-Received: by 2002:ac2:44d7:0:b0:4f3:859c:a01d with SMTP id d23-20020ac244d7000000b004f3859ca01dmr796700lfm.69.1686050750023; Tue, 06 Jun 2023 04:25:50 -0700 (PDT) Received: from smtpclient.apple ([188.187.60.230]) by smtp.gmail.com with ESMTPSA id h18-20020a197012000000b004f382ae9892sm1445345lfc.247.2023.06.06.04.25.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Jun 2023 04:25:49 -0700 (PDT) From: Vitaliy Gusev Message-Id: <8387AC83-6667-48E5-A3FA-11475EA96A5F@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_A8750554-1230-4B80-AEEC-C03D927B7A32" List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\)) Subject: Re: BHYVE SNAPSHOT image format proposal Date: Tue, 6 Jun 2023 14:25:38 +0300 In-Reply-To: <6b98da58a5bd8e83bc466efa20b5a900298210aa.camel@FreeBSD.org> Cc: virtualization@freebsd.org, freebsd-hackers@freebsd.org To: =?utf-8?Q?Corvin_K=C3=B6hne?= References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <6b98da58a5bd8e83bc466efa20b5a900298210aa.camel@FreeBSD.org> X-Mailer: Apple Mail (2.3731.500.231) X-Rspamd-Queue-Id: 4Qb7Sw3SKXz4NyC X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail=_A8750554-1230-4B80-AEEC-C03D927B7A32 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Corvin,=20 Thanks for your comments and advices.=20 Answers are below, > On 5 Jun 2023, at 18:32, Corvin K=C3=B6hne = wrote: >=20 > On Tue, 2023-05-23 at 19:05 +0300, Vitaliy Gusev wrote: >> 2. HEADER PHYS format:=20 >>=20 >> 0 = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+=20= >> | IDENT STRING - 64 BYTES | >> 64 = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+ =20= >> | NVLIST SIZE - 4 BYTES | NVLIST DATA | >> 72 = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+ >> | | >> | NVLIST DATA | >> | | >> 4096 = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+ >>=20 >>>=20 >>> IDENT STRING - Each producer can set its own value to specify >>> image. >>> NVLIST SIZE - The following packed header nvlist data size. >>> NVLIST DATA - Packed nvlist header data. >>>=20 >>> 4KB should be enough for the HEADER to keep basic information about >>> Sections. However, it can >>> be enlarged lately, without breaking backward compatibility.=20 >>>=20 >=20 > I can't see an advantage of using a fixed sized header of 4KB. You = have > to parse the offset and size of every section anyways. If it's for > alignment requirements you can still align all sections on save and = set > the offset accordingly. So, why complicating things by using a fixed > header size? You are right about 4KB restriction. I will correct it in updated format = proposal. Idea is to reserve enough space for HEADER and write it after all finished = stages at the beginning=20 of a snapshot file. Implementation (snapshot path) should know estimated maximum size of the = header and can use the possible maximum. Currently 4KB is enough and easily can be increased in the bhyve=E2=80=99s code without any problem.=20 Alignment is useful to debug and looking into snapshot image file. >=20 > The IDENT STRING seems to be very large. Even a GUID which should be a > global unique identifier uses just 16 Bytes. Additionally, it might be > worth using a dedicated ident and version field for an easier version > parsing. E.g.: Intention is to add enough space for the future version (as reservation) = and other producers and companies to specify it=E2=80=99s own ID string with possible add-on = information. So adding 64 bytes for the future is not so huge pay, but can be very useful. During resume, if IDENT string is not the same as in bhyve, resume can = fail before parsing other data, because it could be that internal format is not as expected. I would not to fix IDENT string format and just apply rule: During resume, bhyve compares its own IDENT string and IDENT string from = an Snapshot image. If it is not the same, further assumption about format = cannot be done, and resume should fail. >=20 > +------------------+-------------------+ > | IDENT - 56 BYTES | VERSION - 8 BYTES | > +------------------+-------------------+ >=20 > IDENT - "BHYVE CHECKPOINT IMAGE" > VERSION - 1 (as uint64_t) >=20 > Btw: I don't care but here we could leave some free space for possible > forward compatibility. E.g.: >=20 > +------------------+-------------------+-------------------------+ > | IDENT - 16 BYTES | VERSION - 8 BYTES | _FREE_SPACE_ - 40 BYTES | > = +------------------+-------------------+=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+ ... >> 4. EXAMPLE: >>=20 >>=20 >> IDENT STRING: >>=20 >> "BHYVE CHECKPOINT IMAGE VERSION 1" >>=20 >> NVLIST HEADER:=20 >>=20 >> [config] >> config.offset =3D 0x1000 (4096) >> config.size =3D 0x1f6 (502) >> config.type =3D "text" >>=20 >=20 > Not sure if it's just an example for the "text" type. bhyve converts = it > into a nvlist, so it could be saved directly as nvlist. > Btw: I would only implement the "text" type if there's an usecase that > can't be solved by one of the other types. Intention is to use current engine to dump bhyve=E2=80=99s config and = read config from a file (-k option). Advantage of using =E2=80=9Ctext=E2=80=9D type - simple implementation = and as an example of flexibility of proposed image format. Image file can keep any types = that a producer would like to use: text, nvlist, binary, diff-pages, etc. >=20 > All in all, it looks good. Keep on your work! >=20 > Regards checksum feature: > We should focus on enabling this feature by default before adding > advanced features. So, keep it simple and small. Could you give a more example what you meant about =E2=80=9Cchecksum=E2=80= =9D feature? Did you mean as TAR=E2=80=99s checksum, i.e. only header? >=20 > Regards forward compatibility: > Backward compatibility is way more important than forward > compatibility. Nevertheless, forward compatibility would be nice to > have. So, we should keep it in mind when modifying the layout. For the > moment, just focus on a format which is backward compatible. >=20 It seems that having information about forward compatibility could be = very useful, at least to get it in advance if it is impossible to restore. I = will add it during implementing this format. Thanks, Vitaliy Gusev --Apple-Mail=_A8750554-1230-4B80-AEEC-C03D927B7A32 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi = Corvin, 

Thanks for your comments and = advices. 

Answers are = below,

On 5 Jun 2023, at = 18:32, Corvin K=C3=B6hne <corvink@FreeBSD.org> wrote:

On Tue, 2023-05-23 at 19:05 +0300, Vitaliy = Gusev wrote:
2. HEADER PHYS = format: 

 0 =   =  +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= + 
      |       =  IDENT STRING  - 64 BYTES         = |
 64   = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+ =   
      | NVLIST SIZE  - 4 = BYTES   |  NVLIST DATA |
 72   = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+
=       | =                     =                     = |
      |       =         NVLIST DATA         =       |
      |       =                     =               |
 4096 = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+
=

IDENT STRING - Each producer can set = its own value to specify
image.
NVLIST SIZE  - The following = packed header nvlist data size.
NVLIST DATA - Packed nvlist header = data.

4KB should be enough for the HEADER to keep basic = information about
Sections. However, it can
be enlarged lately, = without breaking backward = compatibility. 


I can't see an advantage of using a fixed = sized header of 4KB. You have
to = parse the offset and size of every section anyways. If it's = for
alignment requirements you can still align all sections on = save and set
the = offset accordingly. So, why complicating things by using a = fixed
header = size?

You are = right about 4KB restriction. I will correct it in updated format = proposal. Idea is
to reserve enough space for HEADER and write = it after all finished stages at the beginning 
of a = snapshot file.

Implementation (snapshot path) = should know estimated maximum size of the header and can
use = the possible maximum. Currently 4KB is enough and easily can = be
increased in the bhyve=E2=80=99s code without any = problem. 

Alignment is useful to debug and = looking into snapshot image file.


The = IDENT STRING seems to be very large. Even a GUID which should be = a
global unique identifier = uses just 16 Bytes. Additionally, it might be
worth using a dedicated ident and version = field for an easier version
parsing. E.g.:

Intention = is to add enough space for the future version (as reservation) and other = producers
and companies to specify it=E2=80=99s own ID string = with possible add-on information. So adding  64 bytes
for = the future is not so huge pay, but can be very = useful.

During resume, if IDENT string is not = the same as in bhyve, resume can fail before parsing
other = data, because it could be that internal format is not as = expected.

I would not to fix IDENT string = format and just apply = rule:

During resume, bhyve = compares its own IDENT string and IDENT string from = an
Snapshot image. If it is not the same, further assumption = about format cannot be done,
and resume should = fail.


+------------------+-------------------+
| IDENT - 56 BYTES | VERSION - 8 BYTES = |
+------------------+-------------------+

IDENT - "BHYVE CHECKPOINT IMAGE"
VERSION - 1 (as uint64_t)

Btw: I don't care but here we could leave = some free space for possible
forward = compatibility. E.g.:

+------------------+-------------------+---------------------= ----+
| IDENT = - 16 BYTES | VERSION - 8 BYTES | _FREE_SPACE_ - 40 BYTES |
+------------------+-------------------+=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94+
...
4. = EXAMPLE:


 IDENT STRING:

      =  "BHYVE CHECKPOINT IMAGE VERSION 1"

 NVLIST = HEADER: 

  [config]
    =     config.offset =3D 0x1000 = (4096)
        config.size =3D 0x1f6 = (502)
        config.type =3D = "text"


Not = sure if it's just an example for the "text" type. bhyve converts = it
into a nvlist, so it = could be saved directly as nvlist.
Btw: I would only implement the "text" type = if there's an usecase that
can't = be solved by one of the other types.


Intentio= n is to use current engine to dump bhyve=E2=80=99s config and read = config
from a file (-k = option).

Advantage of using =E2=80=9Ctext=E2=80=9D= type - simple implementation and as an example
of flexibility = of proposed image format. Image file can keep any types that
a = producer would like to use: text, nvlist, binary, diff-pages, = etc.


All in all, it looks good. Keep on your = work!

Regards = checksum feature:
We = should focus on enabling this feature by default before adding
advanced features. So, keep it simple and = small.

Could you = give a more example what you meant about =E2=80=9Cchecksum=E2=80=9D = feature? Did you mean as
TAR=E2=80=99s checksum, i.e. only = header?



Regards forward compatibility:
Backward compatibility is way more = important than forward
compatibility. Nevertheless, forward compatibility would be = nice to
have. = So, we should keep it in mind when modifying the layout. For = the
moment, = just focus on a format which is backward compatible.


It seems that having = information about forward compatibility could be very
useful, = at least to get it in advance if it is impossible to restore. I will add = it during
implementing this = format.

Thanks,
Vitaliy = Gusev

= --Apple-Mail=_A8750554-1230-4B80-AEEC-C03D927B7A32--