Re: BHYVE SNAPSHOT image format proposal

From: Corvin Khne <corvink_at_FreeBSD.org>
Date: Mon, 12 Jun 2023 12:52:38 UTC
On Wed, 2023-06-07 at 21:25 +0300, Vitaliy Gusev wrote:
> Hi Corvin, 
> 
> > On 6 Jun 2023, at 15:59, Corvin Köhne <corvink@FreeBSD.org> wrote:
> > > ...
> > 
> > We may have different version of the format from the same produce.
> > IMHO, it makes sense to have a dedicated IDENT and VERSION field to
> > easily figure out
> > 
> > 1) if the producer of the image is known
> > 2) if we support that version of the producer
> > 
> > Even if you allocated a huge amount of free space, someone would
> > need
> > more. So, what do you think about this format:
> > 
> > +------------------------------------------------------------------
> > ---+
> > | IDENT - 16 BYTES
> >                                                    |
> > +-------------------+-----------------------+----------------------
> > ---+
> > | VERSION - 4 BYTES | NVLIST SIZE - 4 BYTES | NVLIST OFFSET - 8
> > BYTES |
> > +-------------------+-----------------------+----------------------
> > ---+
> > | POSSIBLE FREE SPACE (e.g. for custom data, alignment etc.)
> >          |
> > +------------------------------------------------------------------
> > ---+
> > | NVLIST DATA
> >                                                         |
> > +------------------------------------------------------------------
> > ---+
> > | POSSIBLE FREE SPACE (for whatever reason)
> >                           |
> > +------------------------------------------------------------------
> > ---+
> > | SNAPSHOT DATA
> >                                                       |
> > +——————————————————————————————————+
> > 
> 
> Note, simple string "BHYVE CHECKPOINT IMAGE” has 22 bytes. So 16
> bytes seems
> too small. 
> 

I don't care about the final string but we can just use another string
like "BHYVE SNAPSHOT" which fits into 16 bytes.

> So I would not to complicate a header first. 
> 
> I would rather describe ideas, conditions and then solutions:
> 
>    1. Need to distinguish snapshot image file from other files. 
>       Solution: Header should have "magic id”.
> 

That what your previous IDENT field was for.

>       
>    2. Need a barrier for resuming if image "is not ours”. Idea is not
> to allow to resume images from other producers.
>       
>       The reason to have it and use it instead of header versioning: 
>       
>       Imagine that mainstream has its own implementation and
> company’s fork repo has its own implementation. How to ensure
> that the versions in an image file are ours and not somebody’s else?
>       

Companies should use the upstream version. If it doesn't fit their use
case, they should work on improving the upstream version.

If a company decides that it needs an own image format, they can change
the "magic id", upstream bhyve stops processing it and we don't have to
care about it any more.

As I can't see a use case for this field, I would drop it.

>       Solution:  Header should have “Producer id” string.
>       
>       Example: Snapshot image file has empty Producer string , but
> bhyve has current Producer as “MYCOMPANY”. Strings are not equal,
> resume must fail.
>       
>    3. The Rule above does not restrict getting/decoding data from an
> image file. It should be possible to look at an image file and
> analyse internals, to get/decode values, etc.
>       Solution: Have additional option either in bhyve or bhyvectl to
> get into image file.
> 

Agreed, that it would be great to have some tools for analysing the
image file.


>    4. Following nvlist header data should have a short information
> about image file and its internals.
>       Solution: NV HEADER can have several sections: “config”,
> “kernel”, “devices”, “memory”, …
>       

Agreed.

>    5. Versioning of NV HEADER. Idea is to have an information in
> advance whether it is possible to be resumed or not. In other words,
> before do resume, get information about ability to resume.
>       
>       Solution: Each Section should have “version”  and “subversion”.
> While “version” is responsible for both  types of compatibility:
> backward and forward, “subversion” is for forward compatibility only.
>       

IMHO, that's complicating the format too much. Now you have to check
two version fields for each section and you have to check if all
required sections are present.

It would be much easier with a single version field in the header. In
that case, bhyve can check if the version is supported. That's a single
check!

>       Rules for check:
>               If bh_version == version && bh_subversion >= subversion
>  then
>                         Bhyve able to resume the Section
>               Else
>                         Bhyve cannot resume the Section
>               Endif
>       
>       Example 1: Section in image has “version=1", “subversion=5”,
>  Bhyve has “version=1", “subversion=6". That means, bhyve can resume
> the Section.
>       
>       Example 2: The same image Section, but bhyve has “version=1",
> “subversion=4". Bhyve cannot resume the Section.
>       
>       Example 3: The same image Section, but bhyve has “version=2",
> “subversion=5”. Bhyve cannot resume the Section.
>       
>       Rules for increasing versions:
>         -  If during code-change “backward” compatibility is broken,
> “version” should be increased and “subversion” is set to 0.
>         -  If during code-change “forward” compatibility is broken,
> “subversion” should be increased.
>       
>    6. Other versioning in HEADER is redundant. If something is
> changed in the format, “magic id” can be changed appropriately.
>       Solution: “magic id” should be stable and not changed for a
> long time.
> 

"magic id" should never change!

> 
> As result I would suggest to give at least 32 bytes for "magic id” /
> ident and 32 bytes for “producer id”.
> Format of entire image file can be:
> 
> 
> +-----------------------------------------------------------------+
> |               HEADER MAGIC ID     - 32 BYTES                    |
> +-----------------------------------------------------------------+
> |               HEADER PRODUCER ID  - 32 BYTES                    |
> +———————————————----------------————————--—-----------------------+
> |               NVLIST HEADER SIZE  -  4 BYTES                    | 
> +-----------------------------------------------------------------+
> |               NVLIST HEADER DATA (SECTIONS)                     |
> +-----------------------------------------------------------------+
> |                       SNAPSHOT DATA                             |
> +-----------------------------------------------------------------+
> 
> 
> MAGIC ID: should be hardcoded: "BHYVE CHECKPOINT IMAGE”.
> 
> PRODUCER ID: can be empty and supported by producer, i.e. reserved. 
> 
> NVLIST HEADER SIZE: has enough dimension, but in general size is less
> than 4KB
> 
> NVLIST HEADER DATA: Packed nvlist data, contains Sections:  “config”,
> “kernel”, “devices”, “memory”, … :
> 
> > [config]
> >         offset = 0x1000 (4096)
> >         size = 0x1f6 (502)
> >         type = text
> >  vers = 1
> >  subvers = 5
> > [kernel]
> >         offset = 0x11f6 (4598)
> >         size = 0x19a7 (6567)
> >         type = nvlist
> >  vers = 1
> >  subvers = 0
> > [devices]
> >         offset = 0x2b9d (11165)
> >         size = 0x10145ba (16860602)
> >         type = nvlist
> >  vers = 2
> >  subvers = 1
> > [memory]
> >         offset = 0x1200000 (18874368)
> >         size = 0x3ce00000 (1021313024)
> >         type = pages
> >  vers = 1
> >  subvers = 0 
> 
> 
> I hope I gained a whole understanding.
> Thanks,
> Vitaliy Gusev
> 

Btw: We can add feature bits to the nvlist header data. This makes it
unneccessary to leave space for forward compatibility.


-- 
Kind regards,
Corvin