[Bug 281547] nvd->nda + /dev/diskid + zfs triggers locking issues and partition not in /dev

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 16 Sep 2024 23:21:35 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281547

            Bug ID: 281547
           Summary: nvd->nda + /dev/diskid + zfs triggers locking issues
                    and partition not in /dev
           Product: Base System
           Version: 14.1-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: russell.stuart@akips.com

Created attachment 253614
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=253614&action=edit
Problem 1 configuration just before reboot

The issue is that after doing a FreeBSD 13 -> 14, the FreeBSD 14 system fails
to boot.   

Two examples are given here that look different, but they share a lot of common
factors and workarounds are the same, so I suspect the underlying cause is the
same.  The workaround for both is adding either or both of these lines to
/boot/loader.conf and ensure they are preserved during the 13 -> 14 upgrade:

  a.  hw.nvme.use_nvd="1"
  b.  kern.geom.label.disk_ident.enable="0"

I suspect https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241541 is the same
issue.

Problem 1: After the upgrade from FreeBSD 13 -> 14, 14 fails to boot.  The
console displays problem-1-screenshot.png after the failure happens.  It
contains the ZFS WARING lines shown below repeated many times, whereupon it
gives up and displays the "mountpoint>" prompt:

    ZFS WARNING: Unable to open diskid/DISK-2128309CF86Cp3 for writing
(error=1)
    ZFS WARNING: Unable to open diskid/DISK-2128309CF86Cp4 for writing
(error=1)
    ZFS WARNING: Unable to open gpt/akips-home for writing (error=1)
    Mounting from zfs:akips/ROOT/1 failed with error 1

    Loader variables:
      vfs.root.mountfrom=zfs:akips/ROOT/1

    <....elided....>

    mountroot>

I won't describe how here, but from this state I've rebooted back to FreeBSD 13
and got the same error.

Problem 2: After the upgrade from 13 -> 14, 14 fails to boot.  The console
displays problem-2-screenshot.png after the failure happens.  The error message
says it fails to boot because /dev/nda0p1 can't be mounted as /boot/efi.  The
reason if fails to mount is /dev/nvd0p1 doesn't exist.  While /dev/nvd0 does
exist (it's a symlink to nda0), no partition devices exist (ie, neither nda0p1
nor nvd0p1), so attaching the swap partition /dev/nvd0p2 also failed.  All
files under /dev after the reboot are shown in the attachment
problem-2-lr-lR-dev.txt.  Since this is a temporary boot environment, rebooting
reverts back to the FreeBSD 13 root partition that worked moments ago.  It now
fails with the same error.


Reproduction
------------

Some bits are common to both:

Hardware:

- amd64.
- A single nvme drive, at least 50GB.
- No other drives connected.
- 8GB RAM.


Software:

- Very specific disk layout (see the problem descriptions below).
- Doing an upgrade from FreeBSD 13 (nvd) to FreeBSD 14 (nda).


I can provide USB and .iso boot images that reproduce the problem if you have
the hardware described above.  I reproduced it on a NUC and VMWare.  I suspect
I could reproduce it on QEMU if I could figure out a way to configure a nvme
drive that SeaBIOS/OVMF could boot off and FreeBSD 13/14 recognised.  I can get
either one of those working, but not both at the same time.


Reproducing Problem 1
---------------------

This only happens when using BIOS firmware and the disk layout is as shown in
the attachment problem-1-config.txt.  That is the state of play just after a
new install which is an upgrade from FreeBSD 13 to 14 is completed, just before
the reboot onto the new temporary boot environment, akips/ROOT/1, is done.

After the state shown in problem-1-screenshot.png is reached, if you "tickle"
geom/cam by plugging in a bootable USB stick (see
problem-1-screenshot-tickle.png) then type the right incantations into the
"mountpoint>" prompt, zfs finds the root partition and the machine boots
successfully.  Once it's booted successfully the error doesn't happen any more.

Reproducing Problem 2
---------------------

This only happens when using EFI firmware and the disk layout below is as shown
in the attachment problem-2-config.txt.  This is the state of play just after a
new install which is an upgrade from FreeBSD 13 to 14 is completed.  The next
step is to reboot onto the new temporary boot environment, akips/ROOT/1.


Other notes
-----------

Both these are upgrades from a fresh install of FreeBSD 13 to 14.  In both
cases, if you do a fresh install of the FreeBSD 14 image onto the same hardware
under the same conditions it works.

-- 
You are receiving this mail because:
You are the assignee for the bug.