[Bug 277886] ZFS boot loader gives up too easily on unsupported zpool flags

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 07 Apr 2024 01:51:32 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277886

--- Comment #3 from Tomoaki AOKI <junchoon@dec.sakura.ne.jp> ---
(In reply to Xin LI from comment #2)
> I think the ask is simply make check_feature() and check_mos_features()
> (/stand/libsa/zfs/zfsimpl.c) to always return 0 instead of EIO even if
> the feature is not supported, so probably the boot code would emit a lot of
> errors, but eventually somehow able to read in the kernel (which is supposed
> to have the support, as the upgrade happened when the kernel is running).

Basically disagree, but agreed if only allowing "unsupported, enabled but not
active" features.
At worst, broken (actually is in checksum mismatch and/or un-decompressed)
kernel and modules are loaded, passed control and causing undefined behaviour,
which can be fatal.


> There may be some useful scenarios, for example, if a boot code that can't
> support zstd compression is used, and boot file system never had zstd
> compression, then the boot code should, in theory, be able to have sufficient
> ability to boot from the pool.

IIUC, it would be able to check whether the feature is active or not.
Just to be enabled is not an actual problem, but once the feature is actually
used, the state becomes forcibly active (usually on first write access after
enabled).
If the code sizes of legacy zfsboot and gptzfsboot allows, it could be
possible.
Not sure how much space are left and how much codes are needed to implement.
Maybe imp@ would be the best person to make decision here.


> Personally I don't really like this idea, though, because it is not guaranteed
> to work and may expose bugs of the read-only code (because we now broke
> the calling contract).  It would be much better to prevent the upgrade from
> happening on the boot pool, and only allow it when boot environment is updated,
> instead.

Agreed.


> If I was tasked to implement something for this, I'd probably do it by extending
> zpool utility to ask an OS dependent binary about "Is feature X supported by
> the current boot environment" for each read-only incompatible features,
> if the pool have 'bootfs' set, and refuse to upgrade (unless -f is specified)
> these boot pools if any feature is not supported.

Looks reasonable to me. But what's needed to be checked is NOT only the boot
environment, but also the boot codes themselves, too.
(As boot codes in freebsd-boot partition and/or ESP are not parts of ZFS BEs.)
Still remaining problem is that there could be any 3rd party boot loaders
supporting boot FreeBSD from ZFS.


> A naive implementation of the checker can be done by doing something like:

> 1. Gather a list of all partitions that is "freebsd-boot" type or "efi" type
> (this is actually not right; the EFI partition should be mounted and the actual
> EFI/BOOT/BOOT${ARCH}.EFI should be checked instead), and append /boot/loader

/boot/loader should be /boot/zfsloader, unless they are already the things with
different names for compatibility. Anyway, /boot/zfsloader should be preferred
whenever it exists.
Without it, upgrading from (could be non-supported) old versions which both
were actually different, checking for /boot/loader clearly causes
false-negative.


> 2. Check if the feature string is present in the listed files, and return false
> if any of them do not have the string.

Maybe the easiest (but far from perfect) way would be to implement codes for it
in zpool as FreeBSD-specific. Implementing for all supported OS'es by OpenZFS
would be much harder and would be harder to be accepted by upstream.

Checks for freebsd-boot pattition should be simply searching the whole
partition.
Possibly, gpart would be required to zero'ing whole partition before writing
boot code to it, not to be confused by remnants of old contents.

For ESP, yes, it should be mounted to check. And at least, the presense and
contents of
  EFI/freebsd/loader.efi
  EFI/freebsd/boot1.efi (at least if EFI/freebsd/loader.efi does not exist)
  EFI/boot/boot${ARCH}.efi (including the check whether it's FreeBSD's one or
not)
should be checked. To be paranoid, possibly, EFI/freebsd/BOOT${ARCH}.efi, too?

Oops...too many things to consider. And still, maybe I'm overlooking something
fatal.
But it's clearly worth considering to avoid unbootable OS'es.

-- 
You are receiving this mail because:
You are the assignee for the bug.