BHYVE SNAPSHOT image format proposal
Date: Tue, 23 May 2023 16:05:31 UTC
Hi, Here is a proposal for bhyve snapshot/checkpoint image format improvements. It implies moving snapshot code to nvlist engine. Current snapshot implementation has disadvantages: 3 files per snapshot: .meta, .kern, vram Binary Stream format of data. Adding optional variable - breaks resume Removing variable - breaks resume Changing saved order of variables - breaks resume Hard to get information about what is saved and decode. Hard to debug if somethings goes wrong No versions. If change code, resume of an old images can be passed, but with UB. New nvlist implementation should solve all things above. The first step - improve snapshot/checkpoint saving format. It eliminates three files usage per a snapshot. 1. BHYVE SNAPSHOT image format: +βββββββββββββββββββββββββββββββββββββββ+ | HEADER PHYS - 4096 BYTES | +βββββββββββββββββββββββββββββββββββββββ+ | | | DATA | | | +βββββββββββββββββββββββββββββββββββββββ+ 2. HEADER PHYS format: 0 +βββββββββββββββββββββββββββββββββββββββββ+ | IDENT STRING - 64 BYTES | 64 +βββββββββββββββββββββββββββββββββββββββββ+ | NVLIST SIZE - 4 BYTES | NVLIST DATA | 72 +βββββββββββββββββββββββββββββββββββββββββ+ | | | NVLIST DATA | | | 4096 +βββββββββββββββββββββββββββββββββββββββββ+ IDENT STRING - Each producer can set its own value to specify image. NVLIST SIZE - The following packed header nvlist data size. NVLIST DATA - Packed nvlist header data. 4KB should be enough for the HEADER to keep basic information about Sections. However, it can be enlarged lately, without breaking backward compatibility. 3. NVLIST HEADER consists of Sections in the following format: Name - string Type: string: βtext, - plain text, βnvlistβ - packed nvlist, βbinaryβ - raw binary data. Size - Size of section - uint64 Offset - Offset in image format - uint64 Predefined sections: βconfigβ, βdevicesβ, βkernelβ, βmemoryβ. 4. EXAMPLE: IDENT STRING: "BHYVE CHECKPOINT IMAGE VERSION 1" NVLIST HEADER: [config] config.offset = 0x1000 (4096) config.size = 0x1f6 (502) config.type = "text" [kernel] kernel.offset = 0x11f6 (4598) kernel.size = 0x19a7 (6567) kernel.type = βnvlist" [devices] devices.offset = 0x2b9d (11165) devices.size = 0x10145ba (16860602) devices.type = "nvlist" [memory] memory.offset = 0x1200000 (18874368) memory.size = 0x3ce00000 (1021313024) memory.type = βbinary" SECTIONS: [section "config" size 0x1f6 offset 0x1000]: memory.size=1024M x86.strictmsr=true x86.vmexit_on_hlt=true cpus=2 acpi_tables=true pci.0.0.0.device=hostbridge pci.0.31.0.device=lpc pci.0.4.0.device=virtio-net pci.0.4.0.backend=tap0 pci.0.7.0.device=fbuf pci.0.7.0.tcp=10.42.0.78:5900 pci.0.7.0.w=1024 pci.0.7.0.h=768 pci.0.5.0.device=ahci pci.0.5.0.port.0.type=cd pci.0.5.0.port.0.path=/ISO/ubuntu-22.04.1-live-server-amd64.iso lpc.bootrom=/usr/local/share/uefi-firmware/BHYVE_UEFI.fd checkpoint.date="Wed Jan 25 23:48:29 2023" name=ubuntu22 [section "kernel" size 0x19a7 offset 0x11f6]: [vm] vm.vds_version = 0x1 (1) vm.cpu0.data(BINARY): 00 00 00 00 0D 00 00 00 01 00 00 00 00 00 00 00 ... size=0x28 vm.cpu1.data(BINARY): 00 00 00 00 0D 00 00 00 01 00 00 00 00 00 00 00 ... size=0x28 vm.checkpoint_tsc = 0xe2e0ac6fbe456 (3991273496896598) [hpet] hpet.vds_version = 0x1 (1) hpet.data(BINARY): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... size=0x118 [vmx] vmx.vds_version = 0x1 (1) vmx.cpu_features = 0 (0) vmx.cpu0.vmx_data(BINARY): F0 CC 15 B8 FF FF FF FF 40 B4 21 B9 FF FF FF FF ... size=0x288 vmx.cpu1.vmx_data(BINARY): F0 CC 15 B8 FF FF FF FF 00 00 67 41 D8 9B FF FF ... size=0x288 [ioapic] ioapic.vds_version = 0x1 (1) ioapic.data(BINARY): 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ... size=0x208 [lapic] lapic.vds_version = 0x1 (1) lapic.cpu0.data(BINARY): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... size=0x460 lapic.cpu1.data(BINARY): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... size=0x460 [atpit] atpit.vds_version = 0x1 (1) atpit.data(BINARY): 00 00 00 00 00 00 00 00 54 AD 51 97 0F 0E 00 00 ... size=0xa0 [atpic] atpic.vds_version = 0x1 (1) atpic.data(BINARY): 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ... size=0x84 [pmtimer] pmtimer.vds_version = 0x1 (1) pmtimer.uptime = 0x26fd133e5cc (2679274464716) [rtc] rtc.vds_version = 0x1 (1) rtc.data(BINARY): 0A 00 00 00 00 FE FF FF 10 35 13 3D 40 F7 14 00 ... size=0x98 β Thanks, Vitaliy Gusev