Re: How to get past "internal error: cannot import 'zroot': Integrity check failed" (no ability to import the pool)?

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 28 Aug 2022 00:18:09 UTC
On 2022-Aug-24, at 20:57, Mark Millard <marklmi@yahoo.com> wrote:

> I seem to have gotten into a state where no zpool related
> command that required identification of a pool (such as
> by name) can work because import can not make the zpool
> available. (I give  more context later.)
> 
> How do I re-establish the freebsd-zfs partition into
> a form that I can repopulate it when its failed pool
> can not be imported? I'm appearently limited to
> zpool commands that reference the device instead of the
> pool (name) because the likes of "zpool import -f -FX
> . . ." leads to a panic.
> 
> Note that this was for media that used zfs just to use
> bectl, not for other typical zfs reasons. For example,
> redundancy was-not/is-not a goal. For reference:
> 
> =>        40  3907029088  da0  GPT  (1.8T)
>          40       32728       - free -  (16M)
>       32768      524288    1  efi  (256M)
>      557056     7340032    2  freebsd-swap  (3.5G)
>     7897088    26214400       - free -  (13G)
>    34111488    20971520    3  freebsd-swap  (10G)
>    55083008    12582912       - free -  (6.0G)
>    67665920    29360128    4  freebsd-swap  (14G)
>    97026048     4194304       - free -  (2.0G)
>   101220352    33554432    5  freebsd-swap  (16G)
>   134774784    67108864    6  freebsd-swap  (32G)
>   201883648   364904448    7  freebsd-swap  (174G)
>   566788096  2795503616    8  freebsd-zfs  (1.3T)
>  3362291712   544737416       - free -  (260G)
> 
> At this point no attempt to preserve the content of
> the freebsd-zfs partition seems a likely way of going.
> But I'm unclear on how to even start over, given no
> ability to make the pool accessible by name.
> 
> The sequence leading to how things are went like . . .
> 
> # git -C /usr/ports fetch
> error: error reading from .git/objects/pack/pack-8e819c78469vm_fault: pager read error, pid 1370 (git)
> c212148fe5d3922cc807e6858768e.pack: Input/output error
> vm_fault: pager read error, pid 1370 (git)
> . . .
> 
> # bectl activate main-CA72
> panic: VERIFY3(0 == bpobj_open(&bpo, dl->dl_os, dlce->dlce_bpobj)) failed (0 == 97)
> 
> cpuid = 0
> time = 1661389515
> KDB: stack backtrace:
> db_trace_self() at db_trace_self_wrapper+0x30
>         pc = 0xffff0000007fcfd0  lr = 0xffff000000101b80
>         sp = 0xffff0000b49f4ee0  fp = 0xffff0000b49f50e0
> 
> db_trace_self_wrapper() at vpanic+0x178
>         pc = 0xffff000000101b80  lr = 0xffff0000004cef08
>         sp = 0xffff0000b49f50f0  fp = 0xffff0000b49f5150
> 
> vpanic() at spl_panic+0x40
>         pc = 0xffff0000004cef08  lr = 0xffff00000129f360
>         sp = 0xffff0000b49f5160  fp = 0xffff0000b49f51f0
> 
> spl_panic() at dsl_deadlist_space_range+0x264
>         pc = 0xffff00000129f360  lr = 0xffff00000133d4f4
>         sp = 0xffff0000b49f5200  fp = 0xffff0000b49f53c0
> 
> dsl_deadlist_space_range() at snaplist_space+0x4c
>         pc = 0xffff00000133d4f4  lr = 0xffff00000133899c
>         sp = 0xffff0000b49f53d0  fp = 0xffff0000b49f5440
> 
> snaplist_space() at dsl_dataset_promote_check+0x648
>         pc = 0xffff00000133899c  lr = 0xffff0000013385e8
>         sp = 0xffff0000b49f5450  fp = 0xffff0000b49f5530
> 
> dsl_dataset_promote_check() at dsl_sync_task_sync+0xcc
>         pc = 0xffff0000013385e8  lr = 0xffff00000135fb3c
>         sp = 0xffff0000b49f5540  fp = 0xffff0000b49f5590
> 
> dsl_sync_task_sync() at dsl_pool_sync+0x3cc
>         pc = 0xffff00000135fb3c  lr = 0xffff00000135251c
>         sp = 0xffff0000b49f55a0  fp = 0xffff0000b49f55e0
> 
> dsl_pool_sync() at spa_sync+0x8f8
>         pc = 0xffff00000135251c  lr = 0xffff00000138be28
>         sp = 0xffff0000b49f55f0  fp = 0xffff0000b49f57f0
> 
> spa_sync() at txg_sync_thread+0x1d8
>         pc = 0xffff00000138be28  lr = 0xffff0000013a2bf8
>         sp = 0xffff0000b49f5800  fp = 0xffff0000b49f58f0
> 
> txg_sync_thread() at fork_exit+0x88
>         pc = 0xffff0000013a2bf8  lr = 0xffff00000047c568
>         sp = 0xffff0000b49f5900  fp = 0xffff0000b49f5950
> 
> fork_exit() at fork_trampoline+0x14
>         pc = 0xffff00000047c568  lr = 0xffff00000081edd4
>         sp = 0xffff0000b49f5960  fp = 0x0000000000000000
> 
> KDB: enter: panic
> [ thread pid 4 tid 100198 ]
> Stopped at      kdb_enter+0x48: undefined       f907011f
> 
> I was unable to boot from the media after this.
> 
> Plugged the media into another machine . . .
> 
> # zpool import -F -n zroot
> cannot import 'zroot': pool was previously in use from another system.
> Last accessed by <unknown> (hostid=0) at Wed Aug 24 18:05:15 2022
> The pool can be imported, use 'zpool import -f' to import the pool.
> 
> # zpool import -f zroot
> Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1588]: failed to load zpool zroot
> Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1612]: failed to load zpool zroot
> Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1616]: failed to load zpool zroot
> internal error: cannot import 'zroot': Integrity check failed
> Abort trap (core dumped)
> 
> # gdb zpool zpool.core
> . . .
> Core was generated by `zpool import -f zroot'.
> Program terminated with signal SIGABRT, Aborted.
> Sent by thr_kill() from pid 1716 and user 0.
> #0  thr_kill () at thr_kill.S:4
> 4       RSYSCALL(thr_kill)
> (gdb) bt
> #0  thr_kill () at thr_kill.S:4
> #1  0x00002b0c6f3794f0 in __raise (s=s@entry=6) at /usr/main-src/lib/libc/gen/raise.c:52
> #2  0x00002b0c6f420494 in abort () at /usr/main-src/lib/libc/stdlib/abort.c:67
> #3  0x00002b0c69415744 in zfs_verror (hdl=0x2b0c76263000, error=2092, fmt=fmt@entry=0x2b0c693d3135 "%s", ap=...) at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:344
> #4  0x00002b0c69416324 in zpool_standard_error_fmt (hdl=hdl@entry=0x2b0c76263000, error=error@entry=97, fmt=0x2b0c693d3135 "%s") at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:729
> #5  0x00002b0c69415ec8 in zpool_standard_error (hdl=0x0, hdl@entry=0x2b0c76263000, error=0, error@entry=97, msg=0x2b0c6ea23350 <__thr_sigprocmask> "\377\203", 
>    msg@entry=0x2b0c665668e8 "cannot import 'zroot'") at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:619
> #6  0x00002b0c6940687c in zpool_import_props (hdl=0x2b0c76263000, config=config@entry=0x2b0c95939080, newname=newname@entry=0x0, props=props@entry=0x0, flags=flags@entry=2)
>    at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_pool.c:2193
> #7  0x00002b0be60f3344 in do_import (config=0x2b0c95939080, newname=0x0, mntopts=0x0, props=props@entry=0x0, flags=flags@entry=2) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3190
> #8  0x00002b0be60f3108 in import_pools (pools=pools@entry=0x2b0c762780e0, props=<optimized out>, mntopts=mntopts@entry=0x0, flags=flags@entry=2, orig_name=0x2b0c7622d028 "zroot", new_name=0x0, 
>    do_destroyed=do_destroyed@entry=B_FALSE, pool_specified=pool_specified@entry=B_TRUE, do_all=B_FALSE, import=0x2b0c665684a0) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3318
> #9  0x00002b0be60e9074 in zpool_do_import (argc=1, argv=<optimized out>) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3804
> #10 0x00002b0be60e3ce8 in main (argc=4, argv=<optimized out>) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:10918
> (gdb) quit
> 
> 
> # zpool import -f -FX -N -R /zroot-mnt -t zroot zprpi
> . . . evantually . . .
> panic: Solaris(panic): zfs: adding existent segment to range tree (offset=7a001ba000 size=40000)
> cpuid = 8
> time = 1661395806
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> vcmn_err() at vcmn_err+0x10c
> zfs_panic_recover() at zfs_panic_recover+0x64
> range_tree_add_impl() at range_tree_add_impl+0x184
> range_tree_walk() at range_tree_walk+0xa4
> metaslab_load() at metaslab_load+0x6a4
> metaslab_preload() at metaslab_preload+0x8c
> taskq_run() at taskq_run+0x1c
> taskqueue_run_locked() at taskqueue_run_locked+0x190
> taskqueue_thread_loop() at taskqueue_thread_loop+0x130
> fork_exit() at fork_exit+0x88
> fork_trampoline() at fork_trampoline+0x14
> KDB: enter: panic
> [ thread pid 6 tid 108968 ]
> Stopped at      kdb_enter+0x44: undefined       f907c27f
> db> 
> 
> I had to unplug the disk to avoid reboots simply retrying
> the import and crashing the same way again.
> 
> SIDE NOTE: Then, on reboot, I saw the following:
> . . .
> Setting hostid: 0x6522bfc4.
> cannot import 'zroot': no such pool or dataset
>        Destroy and re-create the pool from
>        a backup pid 49 (zpool) is attempting to use unsafe AIO requests - not logging anymore
> pid 49 (zpool), jid 0, uid 0: exited on signal 6
> source.
> cachefile import failed, retrying
> nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0)
> ASSERT at /usr/main-src/sys/contrib/openzfs/module/nvpair/fnvpair.c:592:fnvpair_value_nvlist()Abort trap
> Import of zpool cache /etc/zfs/zpool.cache failed, will retry after root mount hold release
> cannot import 'zroot': no such pool or dataset
>        Destroy and re-create the pool from
>        a backup source.
> cachefile imporpid 55 (zpool), jid 0, uid 0: exited on signal 6
> t failed, retrying
> nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0)
> ASSERT at /usr/main-src/sys/contrib/openzfs/module/nvpair/fnvpair.c:592:fnvpair_value_nvlist()Abort trap
> Starting file system checks:
> /dev/gpt/CA72opt0EFI: 281 files, 231 MiB free (14770 clusters)
> FIXED
> . . .
> 
> Removing /etc/zfs/zpool.cache allowed reboots to avoid such.
> END SIDE NOTE.
> 
> 
> For reference, for the machine where I can plug in
> the media:
> 
> # uname -apKU # line split for better readability
> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #59
> main-n256584-5bc926af9fd1-dirty: Wed Jul  6 18:10:52 PDT 2022
> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
> arm64 aarch64 1400063 1400063
> 

I ended up doing:

# zpool labelclear -f /dev/da0p8
# zpool create -o compatibility=openzfs-2.1-freebsd -O compress=lz4 -O atime=off -f -tzprpi zroot /dev/da0p8
# zpool export zprpi

and then doing my normal update procedure to the media (sends and some
adjustments), making the zroot a variant of what is on another machine
(under different pool name).

And . . . The media is back to being importable, even bootable, media.


===
Mark Millard
marklmi at yahoo.com