Re: unable to boot latest 14-stable

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 12 Oct 2023 06:20:10 UTC
On Oct 11, 2023, at 21:34, void <void@f-m.fm> wrote:

> Hi Mark,

Hello.

> I think the zpool upgrade thing is a red herring, a symptom of the root cause.

You reported text showing the loader's
message about rejecting the attempted use.
That text was from before the FreeBSD kernel
had been loaded:

QUOTE
Consoles: EFI console 
GELI Passphrase for disk0p3: 

Calculating GELI Decryption Key for disk0p3: 422818 iterations...
ZFS: unsupported feature: com.delphix:head_errlog
ZFS: pool zroot is not supported
Reading loader env vars from /efi/freebsd/loader.env
Setting currdev to disk0p1:
FreeBSD/arm64 EFI loader, Revision 1.1

Command line arguments: loader.efi
END QUOTE

You also reported that replacing the msdosfs
content fixed the problem, if I understand right.
That in turn means that only that material changed
and the FreeBSD kernel/world did not change via
that specific update, if I understand right.

If you still have access to the loader vintage
from before that msdosfs update, you could put
just that one file back and see what happens.
(This would cross check on if other msdosfs
content was somehow involved.)

> On the 30th September the machine was upgraded from 13-stable of that date to 14-stable of that date. When it became 14-stable, zpool was upgraded.
> 
> In the days between then and 11th October, the machine was rebooted several times,
> each time it came back up normally.
> 
> Around the 10-11th October, sources were updated and a new buildworld etc cycle
> happened, installing a slightly later 14-stable version.
> 
> The boot failure would seem to indicate the msdos materials needed for booting
> changed between the 30th Sept and 10-11th October. Either that or something
> the materials invoke on the zfs side of things.

What all did you change vs. not change when you did
the msdosfs content update? Did you change anything
else? (I'm presuming not.)

> The problem I'm having
> is that I cannot find anything relevant for this on cgit in the timeframe

This I do not understand. The loader's build would be
dependent on the openzfs source code. From what I see
you are indicating that the Sept-30 stable/13 to stable/14
conversion had the openzfs source code from the following:

    • Sun, 24 Sep 2023
        . . . 
        • git: 6cfb90c6ebe4 - stable/14 - zfs: merge openzfs/zfs@5f3069867 (zfs-2.2-release) into stable/14 Martin Matuska

but did not have the openzfs source code from either of
the following:

Wed, 04 Oct 2023
    • . . .
    • git: a21cb0234b89 - stable/14 - zfs: merge openzfs/zfs@8015e2ea6 (zfs-2.2-release) into stable/14 Martin Matuska

Sun, 08 Oct 2023
    • git: fdc38bc6cd28 - stable/14 - zfs: merge openzfs/zfs@2407f30bd (zfs-2.2-release) into stable/14 Martin Matuska

So, unless you have analyzed that source code to know that
none of that more recent source code was involved in the
loader's build, there was plunty of opportunity for the
loader to have changed behavior by openzfs source changing.

Nothing about that needs to get to the stage were the kernel
is loaded and started in order for there to be a potential
difference loader behavior.

>  and there's no mention of anything relevant to this in UPDATING, so I can't anticipate the problem happening again.


UPDATING does not document all the implications of openzfs bugs
or fixes to openzfs bugs. (Similarly for FreeBSD's own bugs,
although more is likely know for them.) As far as I know, no
one tries to figure out all those implications.

As far as I can see, reducing the risk at issue is best handled
by 2 things both being done, as has been stated in various ways
multiple times:

A) Always updating the loader just before doing the pool upgrade.
B) Delaying the pool upgrade until others have tested it out well over a notable time.

(There is no implication about delaying the updates to FreeBSD,
just delaying the pool upgrades.)

===
Mark Millard
marklmi at yahoo.com