Re: 13.3R's installworld killed system--please help!

From: <ax61_at_disroot.org>
Date: Sun, 08 Sep 2024 09:15:43 UTC
Hi Scott,

In short all I can give is a vague suggestion: fix|update the bootcode
for FreeBSD 13.3, along with any "/boot/loader.conf" & "/etc/rc.conf"
settings which could be causing issues.


I am recreating this reply; had deleted the original mails already.
Reference:
    
https://lists.freebsd.org/archives/freebsd-questions/2024-September/005640.html
    
https://lists.freebsd.org/archives/freebsd-stable/2024-September/002368.html

I have heavily reformatted & edited original mail to make sense of the
situation. My questions are for own clarification, or to gain 
information
for others to possibly help.


Scott B wrote ...
> One is a laptop with an AMD A6 CPU
...
> The peripheral storage is a SanDisk SATA 476.9 GB SSD limited to
> SATA-2 speeds
...
> This is the machine with which I am currently ssh'ed into a system
> at sdf.org, where I deal with email.
> 
> The other computer is a Dell tower with a Core2 Extreme (QX9650) CPU
...
> The peripheral storage devices are two ~.9 TB HDDs and six ~1.86 TB
> or larger capacity HDDs.
> 
> The two smaller drives and two of the larger drives are attached to
> the four available internal SATA-2 ports,
> 
> and three of the other drives are attached to USB 3.0 ports on
> add-in cards on the motherboard.
> 
> The last drive is attached to a port on a Jmicron ESATA port at
> SATA-1 speed.

AMD laptop - Email use
    - FreeBSD 13.1*
    - SanDisk 477 GB SSD -- EXternal, SATA 2

Dell Exterme - tower
    - FreeBSD 12.2*
    - 2x 0.9 TB HDD -- INternal, SATA 2
    - 6x 1.86+ TB HDSs
       -- 2x 1.86+ TB -- INternal, SATA 2
       -- 3x 1.86+ TB -- INternal, USB 3
       -- 1x 1.86+ TB -- EXternal, SATA 1


> The two smallest drives are the boot devices and are each
> partitioned with the boot loader's UFS2 partition,
> a a partition containing one component of a two-way ZFS mirror that
> is a boot partition with a pool name of "system".
> 
> There is also a small partition on each for the crash dump area on
> one and /var/crash (UFS2) on the other.  There is also a 2 GB
> partition on each drive for a GEOM mirror that supports a UFS2
> partition for an application.
> 
> Lastly, most of the remaining space on the two small drives are a
> two-way ZFS mirror containing /usr/home and potentially other file
> systems.  That pool's name is "local".
> 
> The four remaining pools have several things, including the two
> remaining pools ("rz7A", a raidz2 pool with 6 components totalling
> ~10.4 TB, and "zmisc", comprising two mirrored vdevs and totalling
> 99 GB)
> and three small GEOM mirrors of varying sizes GEOM-concatenated
> together to hold a UFS2 file system for a work area for ccache trees
> and WRKDIRPREFIX for portmaster(8).
> 
> "system", "local", and "rz7A" are all on GELI-encrypted partitions.
> "zmisc" is not encrypted.

Could you post the partition layout by "gpart" or whatever else
("fdisk"?) works?

Partitions as I understood on 2x small disks ...
    UFS
    ----
       - boot - ? GB
       - crash-dump - ? GB
       - var/crash - ? GB
       - <application> - 2 GB GEOM mirror
    ZFS
    ----
       - "system" - ZFS mirror, bootfs - ? GB; GELI
       - "local" - ZFS mirror; /usr/home + etc - rest GB; GELI


Presumably on some combnination of the larger, 1.86+ TB disks with far
too many partitions without knowing the layout; partitions ...
    ZFS
    ----
       - "rz7A" - RAID-Z2 ~10.4 TB; GELI
       - "zmisc" - ZFS mirror 2x, 99 GB
    UFS
    ---
       - concat'd 3x GEOM mirror, gmirror, as UFS2 fs


> For many years I have been uprading FreeBSD from source, but decided
> to try the freebsd-update(8) process when 13.2-RELEASE was released.

Looks like that was on the laptop (see later for context).?


> I did that and quickly discovered that it had completely undone much
> or all of my OS configuration, especially the network configuration,
> that I had tailored to my needs, so I wasted seemingly endless hours
> over weeks repairing all that into some semblance of usable form.

"freebsd-update" asks about how to merge files or informs that it would 
not
touch the files (that is based on my experience on FreeBSD 1[34]; do not 
know,
or care, how it behaved before).


> Meanwhile I had proceeded with source upgrade of the tower to
> 12.3-RELEASE-p[x].

#Dell Exterme- is now on 12.3*

> When 13.3-RELEASE became available, I first did a source upgrade on
> the laptop.  Running "make installworld" rendered the laptop
> unbootable.

What steps were taken during source upgrade? What were the steps
before "make installworld"?


> but eventually I removed the SSD from it and loaded it into a USB
> 3.0 docking station and attached it to the tower and replicated its
> entire pool ("sysroot") into my largest pool attached to the tower.

"sysroot" just made the appearance!


> Then I reinstalled the SSD into the laptop.

> After downloading the 13.3-RELEASE ... writing it to a thumb drive,
> I booted that on the laptop and installed 13.3-RELEASE from scratch.

#AMD laptop- is now on 13.3-R


> That experience delayed my upgrading the tower to 13.3-RELEASE-p1
> for several months, although I had compiled it on the tower and had
> it ready to install, but first I had upgraded the tower to
> 12.4-RELEASE-p2.

#Dell Exterme- 12.4*


> Finally I dared to try it a day and several hours ago.  I first
> created a boot environment to preserve the current system and also
> made a snapshot of all ZFS file systems in order to have a potential
> rollback point before beginning the installkernel step.

#Dell Exterme- in process of updating to 13.3


> I did that, completed the etcupdate steps, and then did a "shutdown
> -r now".

What options(modes) of "etcupdate" were used & when?

Looks like did not update the boot loader.
Hopefully did not update the ZFS boot pools.?


> After the boot loader asks me for the GELI passphrase for the boot
> pool, I now get the following.
> 
> Calculating GELI Decryption Key for disk0p2: 1563240 iterations...
> 
> BTX loader 1.00  BTX version is 1.02
> 
> After those two lines the blinking cursor jumps up three lines and
> moves to the beginning of the line--not sure which happens first
> because it's too fast.
> 
> After a delay of several seconds, it jumps two lines down and
> repeats the delay and downward jump two or three times, then jumps
> to two lines above the bottom line of the screen.  After a lengthy
> delay it jumps to the bottom line.
> 
> After a much longer delay the cursor jumps three spaces to the right
> and never moves after that point and is unresponsive, although 
> CTL-Alt-Delete
> can still cause a BIOS reset and eventual attempt at reboot.

Could you post a video of the boot process; or a photo or text where
the boot has stuck? I do not see a point, however, if all that would
show the "BTX loader ..." text as quoted.

Does enabling "verbose" booting show any more text? Or, could not
even reach the stage to enable that?


> Since posting the above I have taken the pair of boot drives out of
> the tower
...
> connected that to the laptop, and rewritten the boot code onto each 
> with
> 'gpart bootcode', but that doesn't seem to have changed anything.

How/What was the exact command was used to set up the bootcode?


- Anubhav

--