Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 09 Nov 2022 15:54:33 UTC
On Wed, Nov 9, 2022 at 5:46 AM Alexander Leidinger <Alexander@leidinger.net>
wrote:

> Quoting Alexander Leidinger <Alexander@leidinger.net> (from Tue, 08
> Nov 2022 10:50:53 +0100):
>
> > Quoting Warner Losh <imp@bsdimp.com> (from Mon, 7 Nov 2022 14:23:11
> -0700):
> >
> >>
> >>
> >>       On Mon, Nov 7, 2022 at 4:15 AM Alexander Leidinger
> >> <Alexander@leidinger.net> wrote:
> >>
> >>> Quoting Li-Wen Hsu <lwhsu@freebsd.org> (from Mon, 7 Nov 2022 03:39:19
> GMT):
> >>>
> >>>> The branch main has been updated by lwhsu:
> >>>>
> >>>> URL:
> >>>>
> https://cgit.FreeBSD.org/src/commit/?id=72a1cb05cd230ce0d12a7180ae65ddbba2e0cb6d
> >>>>
> >>>> commit 72a1cb05cd230ce0d12a7180ae65ddbba2e0cb6d
> >>>> Author:     Li-Wen Hsu <lwhsu@FreeBSD.org>
> >>>> AuthorDate: 2022-11-07 03:30:09 +0000
> >>>> Commit:     Li-Wen Hsu <lwhsu@FreeBSD.org>
> >>>> CommitDate: 2022-11-07 03:30:09 +0000
> >>>>
> >>>>      rc(8): Add a zpoolupgrade rc.d script
> >>>>
> >>>>      If a zpool is created by makefs(8), its version is 5000, i.e.,
> all
> >>>>      feature flags are off.  Introduce an rc script to run `zpool
> upgrade`
> >>>>      over the assigned zpools on the first boot.  This is useful to
> the
> >>>>      ZFS based VM images built from release(7).
> >>>
> >>>> diff --git a/share/man/man5/rc.conf.5 b/share/man/man5/rc.conf.5
> >>>> index f9ceabc83120..43fa44a5f1cb 100644
> >>>> --- a/share/man/man5/rc.conf.5
> >>>> +++ b/share/man/man5/rc.conf.5
> >>>> @@ -24,7 +24,7 @@
> >>>>   .\"
> >>>>   .\" $FreeBSD$
> >>>>   .\"
> >>>> -.Dd August 28, 2022
> >>>> +.Dd November 7, 2022
> >>>>   .Dt RC.CONF 5
> >>>>   .Os
> >>>>   .Sh NAME
> >>>> @@ -2109,6 +2109,13 @@ A space-separated list of ZFS pool names for
> >>>> which new pool GUIDs should be
> >>>>   assigned upon first boot.
> >>>>   This is useful when using a ZFS pool copied from a template, such
> >>>> as a virtual
> >>>>   machine image.
> >>>> +.It Va zpool_upgrade
> >>>> +.Pq Vt str
> >>>> +A space-separated list of ZFS pool names for which version should
> >>>> be upgraded
> >>>> +upon first boot.
> >>>> +This is useful when using a ZFS pool generated by
> >>>> +.Xr makefs 8
> >>>> +utility.
> >>>
> >>> For someone who knows ZFS well, it is clear that only a zpool upgrade
> >>> is done. Not so experienced people may assume there is a combination
> >>> of zpool upgrade and zfs upgrade (more so for people which do not know
> >>> what the difference is). Maybe you want to add some explicit
> >>> documentation, that zfs upgrade + feature flags needs to be done by
> >>> hand.
> >>>
> >>> And this brings me to a second topic, we don't have an explicit list
> >>> of features which are supported by the bootloader (I had a look at the
> >>> zfs and the boot related man pages, if I overlooked a place, then the
> >>> other places should reference this important part with some text).
> >>
> >>
> >>    There is a fixed list of features we support in the boot loader:
> >>
> >>    /*
> >>  * List of ZFS features supported for read
> >>  */
> >> static const char *features_for_read[] = {
> >>         "org.illumos:lz4_compress",
> >>         "com.delphix:hole_birth",
> >>         "com.delphix:extensible_dataset",
> >>         "com.delphix:embedded_data",
> >>         "org.open-zfs:large_blocks",
> >>         "org.illumos:sha512",
> >>         "org.illumos:skein",
> >>         "org.zfsonlinux:large_dnode",
> >>         "com.joyent:multi_vdev_crash_dump",
> >>         "com.delphix:spacemap_histogram",
> >>         "com.delphix:zpool_checkpoint",
> >>         "com.delphix:spacemap_v2",
> >>         "com.datto:encryption",
> >>         "com.datto:bookmark_v2",
> >>         "org.zfsonlinux:allocation_classes",
> >>         "com.datto:resilver_defer",
> >>         "com.delphix:device_removal",
> >>         "com.delphix:obsolete_counts",
> >>         "com.intel:allocation_classes",
> >>         "org.freebsd:zstd_compress",
> >>         "com.delphix:bookmark_written",
> >>         "com.delphix:head_errlog",
> >>         "org.openzfs:blake3",
> >>         NULL
> >> };
> >>
> >>    Any feature not on this list will cause the boot loader to
> >> reject the pool.
> >>
> >>    Whether or not it should do that by default, always, or never is an
> open
> >>    question. I've thought there should be a 'shoot footing'
> >> override that isn't
> >>    there today.
> >>
> >
> > Thanks for the list. For those interested, it is in
> >     $SRC/stand/libsa/zfs/zfsimpl.c
> >
> > Just to make my opinion expressed before explicit again, this should
> > be documented in a boot / bootloader related man-page, but isn't.
> >
> > Should the above list be sorted in some way? Maybe in the same order
> > as the zpool-features lists them (sort by feature name after the
> > colon), or alphabetical?
>
> Is it OK if I commit this alphabetical sorting?
> ---snip---
> diff --git a/stand/libsa/zfs/zfsimpl.c b/stand/libsa/zfs/zfsimpl.c
> index 6b961f3110a..36c90613e82 100644
> --- a/stand/libsa/zfs/zfsimpl.c
> +++ b/stand/libsa/zfs/zfsimpl.c
> @@ -118,29 +118,29 @@ static vdev_list_t zfs_vdevs;
>    * List of ZFS features supported for read
>    */
>   static const char *features_for_read[] = {
> -        "org.illumos:lz4_compress",
> -        "com.delphix:hole_birth",
> -        "com.delphix:extensible_dataset",
> -        "com.delphix:embedded_data",
> -        "org.open-zfs:large_blocks",
> -        "org.illumos:sha512",
> -        "org.illumos:skein",
> -        "org.zfsonlinux:large_dnode",
> -        "com.joyent:multi_vdev_crash_dump",
> -        "com.delphix:spacemap_histogram",
> -        "com.delphix:zpool_checkpoint",
> -        "com.delphix:spacemap_v2",
> -        "com.datto:encryption",
>           "com.datto:bookmark_v2",
> -        "org.zfsonlinux:allocation_classes",
> +        "com.datto:encryption",
>           "com.datto:resilver_defer",
> +        "com.delphix:bookmark_written",
>           "com.delphix:device_removal",
> +        "com.delphix:embedded_data",
> +        "com.delphix:extensible_dataset",
> +        "com.delphix:head_errlog",
> +        "com.delphix:hole_birth",
>           "com.delphix:obsolete_counts",
> +        "com.delphix:spacemap_histogram",
> +        "com.delphix:spacemap_v2",
> +        "com.delphix:zpool_checkpoint",
>           "com.intel:allocation_classes",
> +        "com.joyent:multi_vdev_crash_dump",
>           "org.freebsd:zstd_compress",
> -        "com.delphix:bookmark_written",
> -        "com.delphix:head_errlog",
> +        "org.illumos:lz4_compress",
> +        "org.illumos:sha512",
> +        "org.illumos:skein",
> +        "org.open-zfs:large_blocks",
>           "org.openzfs:blake3",
> +        "org.zfsonlinux:allocation_classes",
> +        "org.zfsonlinux:large_dnode",
>           NULL
>   };
> ---snip---
>

This patch looks good because it's a nop and just tidies things up a bit.

Reviewed by: imp


> > As Mark already mentioned some flags, I checked the features marked
> > as read only (I checked in the zpool-features man page, including
> > the dependencies documented there) and here are those not listed in
> > zfsimpl.c. I would assume as they are read-only compatible, we
> > should add them:
> >     com.delphix:async_destroy
> >     com.delphix:bookmarks
> >     org.openzfs:device_rebuild
> >     com.delphix:empty_bpobj
> >     com.delphix:enable_txg
> >     com.joyent:filesystem_limits
> >     com.delphix:livelist
> >     com.delphix:log_spacemap
> >     com.zfsonlinux:project_quota
> >     com.zfsonlinux:userobj_accounting
> >     com.openzfs:zilsaxattr
>
> If my understanding is correct that the read-only compatible parts
> (according to the zpool-features man page) are safe to add to the zfs
> boot, here is what I have only build-tested (relative to the above
> alphabetical sorting):
> ---snip---
> --- zfsimpl.c_sorted        2022-11-09 12:55:06.346083000 +0100
> +++ zfsimpl.c        2022-11-09 13:01:24.083364000 +0100
> @@ -121,24 +121,35 @@
>           "com.datto:bookmark_v2",
>           "com.datto:encryption",
>           "com.datto:resilver_defer",
> +        "com.delphix:async_destroy",
>           "com.delphix:bookmark_written",
> +        "com.delphix:bookmarks",
>           "com.delphix:device_removal",
>           "com.delphix:embedded_data",
> +        "com.delphix:empty_bpobj",
> +        "com.delphix:enable_txg",
>           "com.delphix:extensible_dataset",
>           "com.delphix:head_errlog",
>           "com.delphix:hole_birth",
> +        "com.delphix:livelist",
> +        "com.delphix:log_spacemap",
>           "com.delphix:obsolete_counts",
>           "com.delphix:spacemap_histogram",
>           "com.delphix:spacemap_v2",
>           "com.delphix:zpool_checkpoint",
>           "com.intel:allocation_classes",
> +        "com.joyent:filesystem_limits",
>           "com.joyent:multi_vdev_crash_dump",
> +        "com.openzfs:zilsaxattr",
> +        "com.zfsonlinux:project_quota",
> +        "com.zfsonlinux:userobj_accounting",
>           "org.freebsd:zstd_compress",
>           "org.illumos:lz4_compress",
>           "org.illumos:sha512",
>           "org.illumos:skein",
>           "org.open-zfs:large_blocks",
>           "org.openzfs:blake3",
> +        "org.openzfs:device_rebuild",
>           "org.zfsonlinux:allocation_classes",
>           "org.zfsonlinux:large_dnode",
>           NULL
> ---snip---
>
> Anyone able to test some of those or confirms my understanding is
> correct and would sign-off on a "reviewed by" level?
>

I'm inclined to strongly NAK this patch, absent some way to test it.
There's no issues today with any of them being absent causing
problems on boot that have been reported. The ZFS that's in the
boot loader is a reduced copy of what's in base and not everything is
supported. There's no urgency here to rush into this. The ones that
are on the list already are for things that we know we support in the
boot loader because we've gone to the trouble to put blake3 or sha512
into it (note: Not all boot loaders will support all ZFS features in the
future... x86 BIOS booting likely is going to have to be frozen at its
current ZFS feature set due to code size issues).

While most of these options look OK on the surface, I'd feel a lot better
if there were tests for these to prove they work. I'd also feel better if
the ZFS experts could explain how those come to be set on a zpool
as well. I'd settle for a good script that could be run as root (better
would be not as root) that would take a filesystem that was created
by makefs -t zfs and turn on these features after an zpool upgrade.
I have the vague outlines of a test suite for the boot loader that I
could see about integrating something like that into, but most of my
time these days is chasing after 'the last bug' in some kboot stuff I'm
working on (which includes issues with our ZFS in the boot loader
integration).

So not a hard no, but I plea for additional scripts to create images
that can be tested.

Warner


> Bye,
> Alexander.
> --
> http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF
>