Re: Loader can't find /boot/ua/loader.lua on UFS after main-n255828-18054d0220c

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Mon, 30 May 2022 22:34:32 UTC
In message <CANCZdfpqiOj595_uU=kWg4qD37GCO1zXZo6B2MUbWTokUeugHg@mail.gmail.c
om>
, Warner Losh writes:
> --0000000000008495bd05e03b4d42
> Content-Type: text/plain; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> On Mon, May 30, 2022 at 8:14 AM Toomas Soome <tsoome@me.com> wrote:
>
> >
> >
> > On 30. May 2022, at 17:06, Warner Losh <imp@bsdimp.com> wrote:
> >
> >
> >
> > On Mon, May 30, 2022 at 4:26 AM David Wolfskill <david@catwhisker.org>
> > wrote:
> >
> >> On Mon, May 30, 2022 at 08:40:10AM +0300, Toomas Soome wrote:
> >> > ...
> >> > Does loader_4th have same issue?
> >> > ....
> >>
> >> I don't know; I hadn't tried it.  I will do so later today & report
> >> back.
> >>
> >
> > So if it's only one system, and it's only UFS, then what does fsck of tha=
> t
> > UFS system tell you?
> > The loader can't find its UFS filesystem to read the configuration from.
> > So either its having trouble
> > finding the device (unlikely since that code hasn't changed in a long
> > time), or its having heartburn
> > with the UFS system for some reason it's being silent about (within the
> > realm of possibilities because
> > there might be an unknown edge case in Kirks recent  UFS integrity
> > changes). I suspect that the 4th
> > boot loader will have the same issue, but a different error message.
> >
> > Others have reported issues with GELI, but that's not in play here, If I'=
> m
> > reading this correctly. Right?
> >
> > Warner
> >
> >
> > Ye, thats why I was asking about loader_4th. I=E2=80=99m trying to spot t=
> he issue
> > from ufs image sample.
> >
>
> I thought it was a good suggestion. My guess on it not working wasn't to
> imply it wasn't.

Backing out 076002f24d35962f0d21f44bfddd34ee4d7f015d restored the one 
machine of mine that did have the problem. The other three were fine with 
that commit.

To summarize, things I tried:

- Reinstall all boot blocks.
- set currdev to my USB rescue disk, ls works, boots fine
- boot from my USB rescue disk, set currdev to the boot disk, boots
- boot from the USB rescue disk, copy /boot/loader* to the boot disk, works 
around the problem.
- Revert 076002f24d35962f0d21f44bfddd34ee4d7f015d resolves the problem.

Other data points:

My three AMD machines on Asus motherboards had no problem with the commit. 
My Acer laptop with Intel CPU suffered the same problem. Could it be that 
malloc() worked on the Asus/AMD machines while it failed on the Acer/Intel 
laptop?

If my hunch that this may be caused by a malloc() failure, would it be a 
good idea to print a nasty warning when malloc failures in loader occur? 
Because silently failing, resulting in weird behaviour is more of a POLA 
than a nasty message. If not this, a loader variable to enable verbose 
messages might help in debugging these kinds of problems. Again, this 
assumes my hunch that it's a malloc() failure is what actually happened.


-- 
Cheers,
Cy Schubert <Cy.Schubert@komquats.com> or <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e**(i*pi)+1=0