loader.efi architecture for replacing boot1.efi
Warner Losh
imp at bsdimp.com
Sat Dec 16 03:28:02 UTC 2017
On Fri, Dec 15, 2017 at 7:05 PM, Warner Losh <imp at bsdimp.com> wrote:
>
>
> On Dec 15, 2017 6:43 PM, "Eric McCorkle" <eric at metricspace.net> wrote:
>
> On 12/15/2017 20:09, Warner Losh wrote:
>
> > This should be second. Uefi variables Trump all.
> >
> > 2) If not, then attempt to read EFI vars to determine the boot
> location
> >
> > 3) If no EFI vars are defined, and no partition was specified, fall
> back
> > to looking for an installed system on devices
> >
> >
> > This is fine, so long as it is only on the device that the loader loaded
> > from.
>
> It's fine if it's configurable, but there needs to be sane behavior if
> the EFI vars aren't set.
>
>
> Where do we get this info for such a broken setup? Do you have actual
> examples?
>
> > 4) At the very last, do the legacy (what loader.efi currently does)
> > behavior.
> >
> >
> > This is bogus. It violates the uefi boot loader protocol. We must
> > abandon this legacy behavior. The behavior is actively harmful since
> > something random will boot. This has caused actual operational issues at
> > Netflix. Guessing is really bad.
>
> We can't just ditch the current behavior and break everyone's existing
> install, though. Legacy behavior should be supported at least until the
> next major release.
>
>
> What useful setups does this break? Absent a real example, we absolutely
> are breaking this. There is a real cost to doing this that as the de facto
> maintainer of stand I'm unwilling to maintain, test or commit to not
> breaking. The legacy behavior is broken and has caused me hours of pain in
> production. There has been no articulated use case this enables, especially
> since boot loader can be interrupted to specify something in recovery
> scenarios.
>
>
> >
> > Step (3) is done by attempting to stat /boot/loader.conf and
> > /boot/kernel. First, all partitions on the same disk are searched,
> then
> > all remaining partitions are searched.
> >
> > This should allow mechanisms like EFI vars and command-line args to
> work
> > without interference from the fallback mechanisms. However, it also
> > provides robustness in the face of failure modes and uninitialized
> > systems (I personally ran into a problem a while back with a linux
> > system, where I couldn't boot with EFI, because the EFI vars weren't
> > set, because I couldn't set them if I couldn't boot with EFI; had to
> use
> > Shell.efi to sort out the mess...)
> >
> > More importantly, it provides a seamless transition from the way
> things
> > are now to the way we want things to be.
> >
> > Please provide comments and feedback.
> >
> >
> > Please listen when I say searching all devices is actively harmful. The
> > uefi boot manager, which I'm in the process of bringing in, offers a way
> > to specifically say what you want to boot. If someone needs something
> > complicated, they must use that moving forward. Part of what makes the
> > protocol work is loaders giving up early so the next one on the list can
> > be tried.
>
> We also have to deal with the reality that some EFI implementations are
> adversarial. We have to be able to deal with implementations that make
> it difficult to set EFI vars, or which mess with their values (Lenovo is
> particularly notorious for this).
>
> You can disable fallback mechanisms with command-line args or macros or
> whatever, but they need to be there.
>
>
> No. Absent a sane use case, I refuse. Give me a reasonable use case, I
> will reconsider.
>
>
So the current behavior leads to absurd results that nobody else does, and
that we don't do for legacy boot:
If we boot loader.efi/boot1.efi off a hard drive, and find there's no
kernel, we'll load off cdrom or a floppy if we happen to find a kernel
there. That's nuts. What's more, we'll load off a different device (say a
thumb drive), which is also crazy. The last thing you want is to
accidentally pick the thumb drive recovery kernel that happens to be in a
USB slot when you have a primary and secondary partition on two main disks,
but today's behavior chooses that. It's so crazy that I can see no benefit
from supporting, testing and maintaining this. If someone wants to recover
a system, they can do it at the boot loader prompt now (they couldn't
before). If someone really wants to boot his crazy thing, we have a new way
to specify it specifically w/o any ambiguity based on how the devices might
move around.
We already support about 100 boot scenarios that are hard enough to test. I
don't want to commit to supporting this and making it 120 or 150 once you
work out all the combinatorics. We have to trim the matrix of useless
things. So absent a use case that makes sense, that people are actually
doing, I'm having a hard time justifying keeping it around as we transition.
Warner
P.S. On x86, we support geli/nogeli, gpt/mbr, ufs/zfs, and uefi/legacy/both
(24 combinations). Plus we support booting off CDROM, netbooting, etc. For
arm, and arm64 we have a similar number that are possible. zfs/ufs,
u-boot/uefi, and mbr/gpt (plus a number of different u-boot boards). For
mips we have a similar mix. Powerpc we support 4 or 6 ways. It's just too
much to hope to test and ensure works. Each new thing has an non-trivial
cost, and I see zero benefit from this one more thing, especially since it
gets in the way of UEFI boot manager support.
More information about the freebsd-arch
mailing list