Re: boot hangs after installworld at FreeBSD 14.0-CURRENT main-n248198-72f7ddb587a

From: Gary Jennejohn <gljennjohn_at_gmail.com>
Date: Mon, 26 Jul 2021 09:13:19 UTC
On Mon, 26 Jul 2021 08:19:41 +0200
Gary Jennejohn <gljennjohn@gmail.com> wrote:

> On Sun, 25 Jul 2021 19:02:29 +0200
> Gary Jennejohn <gljennjohn@gmail.com> wrote:
> 
> > On Sun, 25 Jul 2021 09:54:35 -0600
> > Warner Losh <imp@bsdimp.com> wrote:
> >   
> > > On Sun, Jul 25, 2021 at 3:30 AM Gary Jennejohn <gljennjohn@gmail.com> wrote:
> > >     
> > > > I updated my FBSD-14 tree yesterday.
> > > >
> > > > uname -a shows FreeBSD 14.0-CURRENT #5 main-n248198-72f7ddb587a.
> > > >
> > > > Did a buildkernel and a clean buildworld yesterday.
> > > >
> > > > This morning I booted the new kernel, did an installworld and rebooted
> > > > the new kernel.
> > > >
> > > > Or, should I say, I tried to reboot the new kernel.
> > > >
> > > > During boot I see the following outptut:
> > > >
> > > > loading /boot/defaults/loader.conf
> > > > /
> > > >
> > > > and the boot hangs.
> > > >
> > > > The second line should have contained
> > > > /boot/test/kernel (I always install new kernels to /boot/test)
> > > >
> > > > followed by lines containing the various modules which get loaded.
> > > >
> > > > Luckily, I had a USB thumb drive with a FreeBSD memstick.img AND a
> > > > complete backup of the old /boot, so I could boot from the thumb
> > > > drive and restore /boot (but I moved /boot to /boot.bad before I
> > > > did that just in case).  With the restored (old) /boot everything
> > > > works.
> > > >      
> > > 
> > > Little has changed in the boot loader. Do you know the hash that worked? Or
> > > if I misread above, the has that failed?
> > >     
> > 
> > The /boot code which works was installed at 07:36 UTC July 9th. So,
> > every change to the boot code since then is a culprit.
> > 
> > Example: 9c1c02093b90ae49745a174eb26ea85dd1990eec change to support.4th.
> > It just so happens that I had a nextboot.conf in the "bad" /boot at the
> > time that the hang occurred.  This is the only potential candidate I
> > can see.
> > 
> > So I'll try overwriting support.4th with the known-good version and
> > see what happens.  But probably not until tomorrow my time.
> >   
> 
> After deleting the nextboot.conf from the "bad" /boot I was able to
> boot using the "bad" /boot.  That's the only change between this boot
> and the previous boot which hung the computer.  Whether this is a
> strong hint that the change to support.4th is the culprit I can't say,
> but since the commit message explicitly mentions nextboot.conf as a
> reason for the change, that may very well be the case.
> 
> I decided that removing nextboot.conf was a better test than using
> the old support.4th.
> 
> The change looks very simple and innocent, but my 4th knowledge is
> pretty much non-existent, so I don't really understand what it does.
> 
> I went back to the "old" /boot because I use nextboot.conf a lot.
> 

I decided to do a further test.

I wanted to check whether some other change in the loader might be
implicated in the hung boot.

I copied the old support.4th to the bad /boot (/boot.b).  I then mv'd
/boot to /boot.g and /boot.b to /boot and created a nextboot.conf
there.  After these steps I was still able to boot successfully.

So, I'd say that this result is a pretty strong indication that the
new support.4th together with a nextboot.conf results in the hung
boot.

-- 
Gary Jennejohn