Re: BHYVE SNAPSHOT image format proposal
- Reply: Vitaliy Gusev : "Re: BHYVE SNAPSHOT image format proposal"
- In reply to: Vitaliy Gusev : "Re: BHYVE SNAPSHOT image format proposal"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 24 May 2023 17:33:48 UTC
@gusev.vitaliy@gmail.com <gusev.vitaliy@gmail.com> : Do you want to explain to me how to test the new "snapshot" feature ? I'm interested to test and stress it on my system. Is it ready to be used ? On Wed, May 24, 2023 at 5:11 PM Vitaliy Gusev <gusev.vitaliy@gmail.com> wrote: > Hi Tomek, > > Try to answer to the all questions below, please let me know if I miss > some important. > > > On 23 May 2023, at 21:58, Tomek CEDRO <tomek@cedro.info> wrote: > > On Tue, May 23, 2023 at 6:06 PM Vitaliy Gusev wrote: > > Hi, > Here is a proposal for bhyve snapshot/checkpoint image format improvements. > It implies moving snapshot code to nvlist engine. > > > Hey there Vitaliy :-) bhyve getting more and more traction, I am new > user of bhyve and no expert, but new and missing features are welcome > I guess.. there was a discussion on the mailing lists recently on > better snapshots mechanism :-) > > > Current snapshot implementation has disadvantages: > 3 files per snapshot: .meta, .kern, vram > > > No problem, unless new single file will be protected against > corruption (filesystem, transfer, application crash) and possible to > be easily and cheaply modified in place? > > > Current snapshot implementation doesn’t have it. I would say more, current > pkg implementation doesn’t track/notify if some of files are changed. > Binary files on a > system can be changed, for example ELF files, without any notification. > > Tar doesn’t have protection for keeping data. Some filesystems like ZFS > guarantee that data is not modified by underlying disks. > > Protecting requires more efforts and it should be clearly defined: what is > purpose. If > purpose is having checksum with 99.9% reliability, NVLIST HEADER can be > widen > to have “checksum” key/value for a Section. > > If purpose is having crypto verification - I believe sha256 program should > be your choice. > > > Binary Stream format of data. > > > This is small and fast? Will new format too? > > > Small is not so perfect. As the first attempt snapshot code is good. But > if you want to get > values related to some specific device, for example, for NIC or HPET, you > cannot get it easily. Please > try :) > > Stream doesn’t have flexibility. It is good for well specified and long > long time discussed protocols > like XDR (NFS), when it has RFC and each position in the stream is > described. Example: RFC1813. > > New format with NVLIST has flexibility and is fast enough. Note, ZFS uses > nvlist for keeping attributes > and more another things. > > > Adding optional variable - breaks resume > Removing variable - breaks resume > Changing saved order of variables - breaks resume > > > Obviously need improvement :-) > > Hard to get information about what is saved and decode. > Hard to debug if somethings goes wrong > > > Additional tools missing? Will new format allow text editor interaction? > > > Why do you need modify snapshot image ? Could you describe more? Do you > modify current 3 snapshot files? > > > No versions. If change code, resume of an old images can be > passed, but with UB. > > > Is new format future proof and provides backward compatibility? > > > Intention of moving to the new format - to have backward compatibility if > some code > is changed: > > > - Adding optional variable > - Removing variable that is not used anymore > - Change order of saving variables > - “Hot Fixes”. > > > If changes are critical and are incompatible, restore stage should have > clear information about > incompatibility and break resume. Ideally it should be able to get > informed even before starting > restore process. For this purpose, the new format introduce versions. > > > > New nvlist implementation should solve all things above. The first step - > improve snapshot/checkpoint saving format. It eliminates three files usage > per a snapshot. > > (..) > > > So this will be new text config based format with variable = value and > sections? > > > This is NVLIST approach with key=value, where key is string, and value can > be > Integer, array, string, etc. > > > How much bigger will be the overal file size increase? > > > Not so huge. NVLIST internals is well specified. For example, for my VM > > [kernel] > > kernel.offset = 0x11f6 (4598) > > kernel.size = 0x19a7 (6567) > > kernel.type = “nvlist" > > [devices] > > devices.offset = 0x2b9d (11165) > > devices.size = 0x10145ba (16860602) > > devices.type = “nvlist” > > So packed size for *kernel* is 6567 bytes, for *devices* is 16860602 > including > framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size > 83386 bytes. > > > > How much longer it will take do decode/encode/process files? > > > It is fast, just several milliseconds. NVLIST is very fast format. It is > already integrated > into bhyve as Config engine. > > > > What is the possibility of format change and backward/foward compatibility? > > > If you are talking about compatibility of a Image format - it should be > compatible in > both directions, at least for not so big format changes. > > If consider overall snapshot/resume compatibility - I believe forward > compatibility > is not case and target. Indeed, why do you need to resume an image > created by > a higher version of a program? > > The most important thing - backward compatibility, i.e. when an image is > created > by an older version of a program, but should be resumed on a new one. > > This is target and and intention of this improvement. > > > Have you considered efficiency comparison of current format, proposed > format, and maybe using SQLITE or JSON storage/parsers? For instance > sqlite would be blazingly fast but hard to migrate. json would be most > versatile but more time/memory consuming? > > > Yes, I know about another formats, like JSON or others. NVLIST is the most > effective and suitable for the current purposes. > > > Maybe EFL approach of storing configuration files for limited > resources embedded system storage that use binary storage data but can > be decompressed in chunks that can be replaced in place? > https://www.enlightenment.org/develop/efl/start > > > There are many things that can be used, but it should be well known, easy, > stable, > fast and supportable. I believe NVLIST is the best choice. > > > Sorry for asking those questions but there may be already good and > verified solutions out there not to reinvent the wheel? :-) > > > Thank you for your questions. If you would like, you can try to test the > new implementation and give feedback. > > ——— > Vitaliy Gusev > > -- Mario.