Re: 13.3R's installworld killed system--please help!
- In reply to: ax61_a_disroot.org: "Re: 13.3R's installworld killed system--please help!"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 09 Sep 2024 02:13:19 UTC
ax61@disroot.org wrote: Thank you for replying! > > In short all I can give is a vague suggestion: fix|update the bootcode > for FreeBSD 13.3, along with any "/boot/loader.conf" & "/etc/rc.conf" > settings which could be causing issues. > What, if anything, has changed in the boot code between 13.2 and 13.3? After all, 13.2 worked just fine. It's 13.3 that appears to be broken. The contents of the tower's /boot/loader.conf, /etc/rc.conf{,.local}, and /etc/sysctl.conf aren't currently available to me. I would have to take the drives out of the tower, connect them to the laptop in the storage docking station, and then import the "system" pool into the laptop using the -f option. I may have to do that, though forced importation of the boot pool onto another machine worries me, but first I'd like clues as to what I would then be looking for. > > I am recreating this reply; had deleted the original mails already. > Reference: > > https://lists.freebsd.org/archives/freebsd-questions/2024-September/005640.html > > https://lists.freebsd.org/archives/freebsd-stable/2024-September/002368.html > > I have heavily reformatted & edited original mail to make sense of the > situation. My questions are for own clarification, or to gain > information > for others to possibly help. > Okay. I'll delete plenty of stuff to avoid wearing down other potential readers. > > Scott B wrote ... > > AMD laptop - Email use > - FreeBSD 13.1* > - SanDisk 477 GB SSD -- EXternal, SATA 2 INternal, SATA 2 > Dell Exterme - tower > - FreeBSD 12.2* > - 2x 0.9 TB HDD -- INternal, SATA 2 > - 6x 1.86+ TB HDSs > -- 2x 1.86+ TB -- INternal, SATA 2 > -- 3x 1.86+ TB -- INternal, USB 3 > -- 1x 1.86+ TB -- EXternal, SATA 1 > > > > The two smallest drives are the boot devices and are each > > partitioned with the boot loader's UFS2 partition, > > a a partition containing one component of a two-way ZFS mirror that > > is a boot partition with a pool name of "system". > > > > There is also a small partition on each for the crash dump area on > > one and /var/crash (UFS2) on the other. There is also a 2 GB > > partition on each drive for a GEOM mirror that supports a UFS2 > > partition for an application. > > > > Lastly, most of the remaining space on the two small drives are a > > two-way ZFS mirror containing /usr/home and potentially other file > > systems. That pool's name is "local". Here are the gpart backup files of the GPT partition table. I guess I should have included them in my original posting. The following partition table backup is for the drive normally at ada0 in the tower. GPT 128 1 freebsd-boot 40 1024 gptboot0 2 freebsd-zfs 2048 721420288 system0 4 freebsd-swap 721422336 25163776 swap0 6 freebsd-zfs 746586112 1149239296 local1 8 freebsd-ufs 1895826472 4194304 dbtor0 13 freebsd-swap 1900020776 52428800 crashdump The next partition table backup is for the drive normally at ada1 in the tower. GPT 152 1 freebsd-boot 40 1024 gptboot1 2 freebsd-zfs 2048 721420288 system1 4 freebsd-swap 721422336 25163776 swap1 6 freebsd-zfs 746586112 1149239296 local0 8 freebsd-ufs 1895826472 4194304 dbtor1 13 freebsd-ufs 1900020776 52428800 varcrash > > > > The four remaining pools have several things, including the two > > remaining pools ("rz7A", a raidz2 pool with 6 components totalling > > ~10.4 TB, and "zmisc", comprising two mirrored vdevs and totalling > > 99 GB) > > and three small GEOM mirrors of varying sizes GEOM-concatenated > > together to hold a UFS2 file system for a work area for ccache trees > > and WRKDIRPREFIX for portmaster(8). > > > > "system", "local", and "rz7A" are all on GELI-encrypted partitions. > > "zmisc" is not encrypted. > > Could you post the partition layout by "gpart" or whatever else > ("fdisk"?) works? See above. > > Partitions as I understood on 2x small disks ... > UFS > ---- > - boot - ? GB > - crash-dump - ? GB > - var/crash - ? GB > - <application> - 2 GB GEOM mirror The 2 GB partitions are a GELI-encrypted mirror of /var/db/tor. The reason for putting this directory on a UFS2 file system is that if keys must be overwritten, they can be, whereas they could not be overwritten in a ZFS file system data set. > ZFS > ---- > - "system" - ZFS mirror, bootfs - ? GB; GELI > - "local" - ZFS mirror; /usr/home + etc - rest GB; GELI > > > Presumably on some combnination of the larger, 1.86+ TB disks with far > too many partitions without knowing the layout; partitions ... At present the following pools and UFS2 file system are not accessable. > ZFS > ---- > - "rz7A" - RAID-Z2 ~10.4 TB; GELI > - "zmisc" - ZFS mirror 2x, 99 GB > UFS > --- > - concat'd 3x GEOM mirror, gmirror, as UFS2 fs > > > > For many years I have been uprading FreeBSD from source, but decided > > to try the freebsd-update(8) process when 13.2-RELEASE was released. > > Looks like that was on the laptop (see later for context).? > Yes, it was. The result was such a mess that I decided never to do it that way again, which is a pity because I run a GENERIC kernel on the laptop, so it would have been a nice time saver. The tower has a tailored kernel. > > > [stuff deleted --SB] > > "freebsd-update" asks about how to merge files or informs that it would > not > touch the files (that is based on my experience on FreeBSD 1[34]; do not > know, > or care, how it behaved before). > Like I said, lesson learned. > > > Meanwhile I had proceeded with source upgrade of the tower to > > 12.3-RELEASE-p[x]. > > #Dell Exterme- is now on 12.3* > > > When 13.3-RELEASE became available, I first did a source upgrade on > > the laptop. Running "make installworld" rendered the laptop > > unbootable. > > What steps were taken during source upgrade? What were the steps > before "make installworld"? > buildworld, buildkernel, installkernel, reboot, etcupdate, installworld, etcupdate, etcupdate resolve, reboot failure. > > > but eventually I removed the SSD from it and loaded it into a USB > > 3.0 docking station and attached it to the tower and replicated its > > entire pool ("sysroot") into my largest pool attached to the tower. > > "sysroot" just made the appearance! > That is the only pool on the SSD, courtesy of the braindead ZFS installer in bsdinstall(8). It needed to have a different name from the tower's "system" to avoid conflicts when backing it up onto the tower. > > > Then I reinstalled the SSD into the laptop. > > > After downloading the 13.3-RELEASE ... writing it to a thumb drive, > > I booted that on the laptop and installed 13.3-RELEASE from scratch. > > #AMD laptop- is now on 13.3-R > > > > That experience delayed my upgrading the tower to 13.3-RELEASE-p1 > > for several months, although I had compiled it on the tower and had > > it ready to install, but first I had upgraded the tower to > > 12.4-RELEASE-p2. > I should point out that 12.4-RELEASE was, IMHO, a very good and reliable release. My experiences with 13.1-RELEASE and 13.2-RELEASE were limited, but generally also good. 13.3-RELEASE is the problem. I was becoming concerned about continuing to run 12.4-RELEASE-p2 for so many months past its expiration. Unfortunately the tower's ancient BIOS is unable to boot from a thumb drive. I believe it would still boot from a CD or DVD, but I now have no way to burn an installer disk to feed it. ` > #Dell Exterme- 12.4* > > > > Finally I dared to try it a day and several hours ago. I first > > created a boot environment to preserve the current system and also > > made a snapshot of all ZFS file systems in order to have a potential > > rollback point before beginning the installkernel step. > > #Dell Exterme- in process of updating to 13.3 > > > > I did that, completed the etcupdate steps, and then did a "shutdown > > -r now". > > What options(modes) of "etcupdate" were used & when? I don't have access to the script(1) file because it is in the zmisc pool, so I don't recall for certain the options used. I *think* I ran # etcupdate -F diff # etcupdate resolve In any case, I don't see how that would kill the boot process so early on before there is even a kernel loaded. > > Looks like did not update the boot loader. Looks like what did not update it? My understanding was that rewriting the boot code was a step incorporated into installworld at least a couple of major releases ago. > Hopefully did not update the ZFS boot pools.? > As in "zpool upgrade -a" ? Not to my knowledge, no. > > > After the boot loader asks me for the GELI passphrase for the boot > > pool, I now get the following. > > > > Calculating GELI Decryption Key for disk0p2: 1563240 iterations... > > > > BTX loader 1.00 BTX version is 1.02 > > > > After those two lines the blinking cursor jumps up three lines and > > moves to the beginning of the line--not sure which happens first > > because it's too fast. > > > > After a delay of several seconds, it jumps two lines down and > > repeats the delay and downward jump two or three times, then jumps > > to two lines above the bottom line of the screen. After a lengthy > > delay it jumps to the bottom line. > > > > After a much longer delay the cursor jumps three spaces to the right > > and never moves after that point and is unresponsive, although > > CTL-Alt-Delete > > can still cause a BIOS reset and eventual attempt at reboot. > > Could you post a video of the boot process; or a photo or text where > the boot has stuck? I do not see a point, however, if all that would Not easily. > show the "BTX loader ..." text as quoted. > > Does enabling "verbose" booting show any more text? Or, could not > even reach the stage to enable that? > AFAICT, it never gets that far. > > > Since posting the above I have taken the pair of boot drives out of > > the tower > ... > > connected that to the laptop, and rewritten the boot code onto each > > with > > 'gpart bootcode', but that doesn't seem to have changed anything. > > How/What was the exact command was used to set up the bootcode? > Although the gpart(8) man page shows doing that in two successive commands for each disk, like gpart bootcode -b /boot/pmbr da0 gpart bootcode -b /boot/pmbr da1 gpart bootcode -p /boot/gptzfsboot -i 1 da0 gpart bootcode -p /boot/gptzfsboot -i 1 da1 I believe I combined the two operations for each drive into a single command: gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1 Note that da0 and da1 were the drive addresses in the docking station when connected to the laptop. I have never had such a total failure before after an upgrade from source. I hope that 13.4-RELEASE and the 14.x releases do not exhibit this problem. Scott