ZFS/bectl use appears to have an example of not waiting for "Root mount waiting for: CAM" (aarch64 example)
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 13 Jan 2023 03:17:54 UTC
The failure: The failure is for making BE 13S-CA72 (in zopt0 on nda1p3) activated (temporary or not or selected via "8" in the boot loader) and then attempting to boot. It finds and uses the kernel okay but the "mount root" stage gets: CPU 7: ARM Cortex-A72 r0p3 affinity: 3 1 Trying to mount root from zfs:zopt0/ROOT/13S-CA72 []... Mounting from zfs:zopt0/ROOT/13S-CA72 failed with error 2: unknown file system. CPU 8: ARM Cortex-A72 r0p3 affinity: 4 0 right after the "Trying" message. This is long before the boot sequence later gets to: Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM Root mount waiting for: CAM nda0 at nvme0 bus 0 scbus4 target 0 lun 1 nda0: <INTEL SSDPE21D960GA E2010480 PHM2911200Z0960CGN> nda0: Serial Number PHM2911200Z0960CGN nda0: nvme version 1.0 x4 (max x4) lanes PCIe Gen3 (max Gen3) link nda0: 915715MB (1875385008 512 byte sectors) nda1 at nvme1 bus 0 scbus5 target 0 lun 1 nda1: <INTEL SSDPED1D960GAY E2010480 PHMB829600B4960EGN> nda1: Serial Number PHMB829600B4960EGN nda1: nvme version 1.0 x4 (max x4) lanes PCIe Gen3 (max Gen3) link nda1: 915715MB (1875385008 512 byte sectors) so that nda1p3 is finally available to provide pool zopt0 . I use: kern.cam.boot_delay=10000 vfs.mountroot.timeout=10 vfs.root_mount_always_wait=1 So it appears that ZFS gives up on the partition way too early under at least some condition(s). At the mountroot> prompt all the BE alternatives in zopt0 fail when the above happened. An example is: mountroot> zfs:zopt0/ROOT/main-CA72 Trying to mount root from zfs:zopt0/ROOT/main-CA72 []... Mounting from zfs:zopt0/ROOT/main-CA72 failed with error 2: unknown file system. This is despite being able to boot the BE main-CA72 directly. It looks like, once having given up early, it does not get out of that state for the drive/partition during mountroot> activity. For reference, after the failure: mountroot> ? List of GEOM managed disk devices: gpt/CA72opt0ZFS gpt/CA72opt0SWP gpt/CA72opt0EFI nda1p3 nda1p2 nda1p1 nda1 gpt/RPi3swp3p5 gpt/CA72optM2swp174 gpt/CA72optM2swp32 gpt/CA72optM2swp16 gpt/CA72optM2swp14 ufsid/619582a0ef9c00b3 gpt/CA72optM2ufs gpt/CA72optM2swp10 gpt/CA72optM2efi nda0p8 nda0p7 nda0p6 nda0p5 nda0p4 nda0p3 nda0p2 nda0p1 nda0 Scrubbing the pool zopt0 does not find anything to fix. But the context is not set up for redundancy, just to allow bectl use. I'd also used zfs sends to update a (nearly) duplicate that I keep on a USB3 NVMe drive, well before discovering the issue existed. That duplicate has no problems booting its updated 13S-CA72 BE on an RPi4B. Creating BE 13S-CA72-copy from an older 13S-CA72 snapshot produced a BE that boots on the example system: 13S-CA72-copy zopt0/ROOT/13S-CA72-copy - - 736K 2023-01-12 16:20 zopt0/ROOT/13S-CA72@to-zprpi-2022-11-16-21-05-58 - - 1.73G 2022-11-16 21:05 13S-CA72-copy's normal kernel (1301509): stable/13-n252944-e52aaa644ce1-dirty: Mon Nov 7 09:55:56 PST 2022 13S-CA72's normal kernel (1301510): stable/13-n253355-d30b57252df8-dirty: Sat Jan 7 01:07:12 PST 2023 Copying 13S-CA72-copy's kernel (1301509-based) into 13S-CA72 and attempting booting based on it still gets the failure in 13S-CA72 . Copying 13S-CA72's kernel (1301510-based) to 13S-CA72-copy and attempting to boot 13S-CA72-copy works just fine. I've no clue why BE 13S-CA72 is "lucky" enough to show the problem. General context information: The bectl context in question is on a HoneyComb (EDK2 UEFI/ACPI style booting). # bectl list -s BE/Dataset/Snapshot Active Mountpoint Space Created 13S-CA72 zopt0/ROOT/13S-CA72 - - 5.38G 2021-09-29 00:57 zopt0/ROOT/main-CA72@2021-04-28-01:40:48-0 - - 3.92G 2021-04-28 01:40 13S-CA72@to-zprpi-2022-11-16-21-05-58 - - 1.73G 2022-11-16 21:05 13S-CA72@to-zprpi-2023-01-08-13-18-20 - - 0 2023-01-08 13:18 13S-CA72@to-zprpi-2023-01-10-19-14-05 - - 0 2023-01-10 19:14 13S-CA72-copy zopt0/ROOT/13S-CA72-copy - - 736K 2023-01-12 16:20 zopt0/ROOT/13S-CA72@to-zprpi-2022-11-16-21-05-58 - - 1.73G 2022-11-16 21:05 13_0R-CA72 zopt0/ROOT/13_0R-CA72 - - 1.80G 2021-09-29 00:45 zopt0/ROOT/main-CA72@2021-04-28-01:40:48-0 - - 3.92G 2021-04-28 01:40 13_0R-CA72@to-zprpi-2022-11-16-21-05-58 - - 0 2022-11-16 21:05 13_0R-CA72@to-zprpi-2023-01-08-13-18-20 - - 0 2023-01-08 13:18 13_0R-CA72@to-zprpi-2023-01-10-19-14-05 - - 0 2023-01-10 19:14 13_1R-CA72 zopt0/ROOT/13_1R-CA72 - - 3.52G 2022-03-10 14:24 zopt0/ROOT/main-CA72@2021-04-28-01:40:48-0 - - 3.92G 2021-04-28 01:40 13_1R-CA72@to-zprpi-2022-11-16-21-05-58 - - 1.61G 2022-11-16 21:05 13_1R-CA72@to-zprpi-2023-01-08-13-18-20 - - 0 2023-01-08 13:18 13_1R-CA72@to-zprpi-2023-01-10-19-14-05 - - 0 2023-01-10 19:14 main-CA72 zopt0/ROOT/main-CA72 NR / 10.4G 2023-01-06 17:43 main-CA72@2021-04-28-01:40:48-0 - - 3.92G 2021-04-28 01:40 main-CA72@to-zprpi-2022-11-16-21-05-58 - - 451M 2022-11-16 21:05 main-CA72@2023-01-06-17:43:54-0 - - 227M 2023-01-06 17:43 main-CA72@to-zprpi-2023-01-08-13-18-20 - - 2.68M 2023-01-08 13:18 main-CA72@to-zprpi-2023-01-10-19-14-05 - - 696K 2023-01-10 19:14 old-main-CA72 zopt0/ROOT/old-main-CA72 - - 404K 2022-11-06 20:28 zopt0/ROOT/main-CA72@2023-01-06-17:43:54-0 - - 227M 2023-01-06 17:43 old-main-CA72@to-zprpi-2023-01-08-13-18-20 - - 0 2023-01-08 13:18 old-main-CA72@to-zprpi-2023-01-10-19-14-05 - - 0 2023-01-10 19:14 The boot media here looks like the below as seen via "gpart show -pl" : => 40 1875384928 nda1 GPT (894G) 40 532480 nda1p1 CA72opt0EFI (260M) 532520 2008 - free - (1.0M) 534528 515899392 nda1p2 CA72opt0SWP (246G) 516433920 20971520 - free - (10G) 537405440 1337979528 nda1p3 CA72opt0ZFS (638G) ( nda1 is an Optane 960GB in the PCIe slot in the HoneyComb. nda1p3 is the partition holding pool zopt0 .) (Note: nda0 is a ufs based boot media that is not what I normally use.) === Mark Millard marklmi at yahoo.com