How to get past "internal error: cannot import 'zroot': Integrity check failed" (no ability to import the pool)?

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 25 Aug 2022 03:57:00 UTC
I seem to have gotten into a state where no zpool related
command that required identification of a pool (such as
by name) can work because import can not make the zpool
available. (I give  more context later.)

How do I re-establish the freebsd-zfs partition into
a form that I can repopulate it when its failed pool
can not be imported? I'm appearently limited to
zpool commands that reference the device instead of the
pool (name) because the likes of "zpool import -f -FX
. . ." leads to a panic.

Note that this was for media that used zfs just to use
bectl, not for other typical zfs reasons. For example,
redundancy was-not/is-not a goal. For reference:

=>        40  3907029088  da0  GPT  (1.8T)
          40       32728       - free -  (16M)
       32768      524288    1  efi  (256M)
      557056     7340032    2  freebsd-swap  (3.5G)
     7897088    26214400       - free -  (13G)
    34111488    20971520    3  freebsd-swap  (10G)
    55083008    12582912       - free -  (6.0G)
    67665920    29360128    4  freebsd-swap  (14G)
    97026048     4194304       - free -  (2.0G)
   101220352    33554432    5  freebsd-swap  (16G)
   134774784    67108864    6  freebsd-swap  (32G)
   201883648   364904448    7  freebsd-swap  (174G)
   566788096  2795503616    8  freebsd-zfs  (1.3T)
  3362291712   544737416       - free -  (260G)

At this point no attempt to preserve the content of
the freebsd-zfs partition seems a likely way of going.
But I'm unclear on how to even start over, given no
ability to make the pool accessible by name.

The sequence leading to how things are went like . . .

# git -C /usr/ports fetch
error: error reading from .git/objects/pack/pack-8e819c78469vm_fault: pager read error, pid 1370 (git)
c212148fe5d3922cc807e6858768e.pack: Input/output error
vm_fault: pager read error, pid 1370 (git)
. . .

# bectl activate main-CA72
panic: VERIFY3(0 == bpobj_open(&bpo, dl->dl_os, dlce->dlce_bpobj)) failed (0 == 97)

cpuid = 0
time = 1661389515
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x30
         pc = 0xffff0000007fcfd0  lr = 0xffff000000101b80
         sp = 0xffff0000b49f4ee0  fp = 0xffff0000b49f50e0

db_trace_self_wrapper() at vpanic+0x178
         pc = 0xffff000000101b80  lr = 0xffff0000004cef08
         sp = 0xffff0000b49f50f0  fp = 0xffff0000b49f5150

vpanic() at spl_panic+0x40
         pc = 0xffff0000004cef08  lr = 0xffff00000129f360
         sp = 0xffff0000b49f5160  fp = 0xffff0000b49f51f0

spl_panic() at dsl_deadlist_space_range+0x264
         pc = 0xffff00000129f360  lr = 0xffff00000133d4f4
         sp = 0xffff0000b49f5200  fp = 0xffff0000b49f53c0

dsl_deadlist_space_range() at snaplist_space+0x4c
         pc = 0xffff00000133d4f4  lr = 0xffff00000133899c
         sp = 0xffff0000b49f53d0  fp = 0xffff0000b49f5440

snaplist_space() at dsl_dataset_promote_check+0x648
         pc = 0xffff00000133899c  lr = 0xffff0000013385e8
         sp = 0xffff0000b49f5450  fp = 0xffff0000b49f5530

dsl_dataset_promote_check() at dsl_sync_task_sync+0xcc
         pc = 0xffff0000013385e8  lr = 0xffff00000135fb3c
         sp = 0xffff0000b49f5540  fp = 0xffff0000b49f5590

dsl_sync_task_sync() at dsl_pool_sync+0x3cc
         pc = 0xffff00000135fb3c  lr = 0xffff00000135251c
         sp = 0xffff0000b49f55a0  fp = 0xffff0000b49f55e0

dsl_pool_sync() at spa_sync+0x8f8
         pc = 0xffff00000135251c  lr = 0xffff00000138be28
         sp = 0xffff0000b49f55f0  fp = 0xffff0000b49f57f0

spa_sync() at txg_sync_thread+0x1d8
         pc = 0xffff00000138be28  lr = 0xffff0000013a2bf8
         sp = 0xffff0000b49f5800  fp = 0xffff0000b49f58f0

txg_sync_thread() at fork_exit+0x88
         pc = 0xffff0000013a2bf8  lr = 0xffff00000047c568
         sp = 0xffff0000b49f5900  fp = 0xffff0000b49f5950

fork_exit() at fork_trampoline+0x14
         pc = 0xffff00000047c568  lr = 0xffff00000081edd4
         sp = 0xffff0000b49f5960  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 4 tid 100198 ]
Stopped at      kdb_enter+0x48: undefined       f907011f

I was unable to boot from the media after this.

Plugged the media into another machine . . .

# zpool import -F -n zroot
cannot import 'zroot': pool was previously in use from another system.
Last accessed by <unknown> (hostid=0) at Wed Aug 24 18:05:15 2022
The pool can be imported, use 'zpool import -f' to import the pool.

# zpool import -f zroot
Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1588]: failed to load zpool zroot
Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1612]: failed to load zpool zroot
Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1616]: failed to load zpool zroot
internal error: cannot import 'zroot': Integrity check failed
Abort trap (core dumped)

# gdb zpool zpool.core
. . .
Core was generated by `zpool import -f zroot'.
Program terminated with signal SIGABRT, Aborted.
Sent by thr_kill() from pid 1716 and user 0.
#0  thr_kill () at thr_kill.S:4
4       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:4
#1  0x00002b0c6f3794f0 in __raise (s=s@entry=6) at /usr/main-src/lib/libc/gen/raise.c:52
#2  0x00002b0c6f420494 in abort () at /usr/main-src/lib/libc/stdlib/abort.c:67
#3  0x00002b0c69415744 in zfs_verror (hdl=0x2b0c76263000, error=2092, fmt=fmt@entry=0x2b0c693d3135 "%s", ap=...) at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:344
#4  0x00002b0c69416324 in zpool_standard_error_fmt (hdl=hdl@entry=0x2b0c76263000, error=error@entry=97, fmt=0x2b0c693d3135 "%s") at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:729
#5  0x00002b0c69415ec8 in zpool_standard_error (hdl=0x0, hdl@entry=0x2b0c76263000, error=0, error@entry=97, msg=0x2b0c6ea23350 <__thr_sigprocmask> "\377\203", 
    msg@entry=0x2b0c665668e8 "cannot import 'zroot'") at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:619
#6  0x00002b0c6940687c in zpool_import_props (hdl=0x2b0c76263000, config=config@entry=0x2b0c95939080, newname=newname@entry=0x0, props=props@entry=0x0, flags=flags@entry=2)
    at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_pool.c:2193
#7  0x00002b0be60f3344 in do_import (config=0x2b0c95939080, newname=0x0, mntopts=0x0, props=props@entry=0x0, flags=flags@entry=2) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3190
#8  0x00002b0be60f3108 in import_pools (pools=pools@entry=0x2b0c762780e0, props=<optimized out>, mntopts=mntopts@entry=0x0, flags=flags@entry=2, orig_name=0x2b0c7622d028 "zroot", new_name=0x0, 
    do_destroyed=do_destroyed@entry=B_FALSE, pool_specified=pool_specified@entry=B_TRUE, do_all=B_FALSE, import=0x2b0c665684a0) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3318
#9  0x00002b0be60e9074 in zpool_do_import (argc=1, argv=<optimized out>) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3804
#10 0x00002b0be60e3ce8 in main (argc=4, argv=<optimized out>) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:10918
(gdb) quit


# zpool import -f -FX -N -R /zroot-mnt -t zroot zprpi
. . . evantually . . .
panic: Solaris(panic): zfs: adding existent segment to range tree (offset=7a001ba000 size=40000)
cpuid = 8
time = 1661395806
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
vcmn_err() at vcmn_err+0x10c
zfs_panic_recover() at zfs_panic_recover+0x64
range_tree_add_impl() at range_tree_add_impl+0x184
range_tree_walk() at range_tree_walk+0xa4
metaslab_load() at metaslab_load+0x6a4
metaslab_preload() at metaslab_preload+0x8c
taskq_run() at taskq_run+0x1c
taskqueue_run_locked() at taskqueue_run_locked+0x190
taskqueue_thread_loop() at taskqueue_thread_loop+0x130
fork_exit() at fork_exit+0x88
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 6 tid 108968 ]
Stopped at      kdb_enter+0x44: undefined       f907c27f
db> 

I had to unplug the disk to avoid reboots simply retrying
the import and crashing the same way again.

SIDE NOTE: Then, on reboot, I saw the following:
. . .
Setting hostid: 0x6522bfc4.
cannot import 'zroot': no such pool or dataset
        Destroy and re-create the pool from
        a backup pid 49 (zpool) is attempting to use unsafe AIO requests - not logging anymore
pid 49 (zpool), jid 0, uid 0: exited on signal 6
source.
cachefile import failed, retrying
nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0)
ASSERT at /usr/main-src/sys/contrib/openzfs/module/nvpair/fnvpair.c:592:fnvpair_value_nvlist()Abort trap
Import of zpool cache /etc/zfs/zpool.cache failed, will retry after root mount hold release
cannot import 'zroot': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.
cachefile imporpid 55 (zpool), jid 0, uid 0: exited on signal 6
t failed, retrying
nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0)
ASSERT at /usr/main-src/sys/contrib/openzfs/module/nvpair/fnvpair.c:592:fnvpair_value_nvlist()Abort trap
Starting file system checks:
/dev/gpt/CA72opt0EFI: 281 files, 231 MiB free (14770 clusters)
FIXED
. . .

Removing /etc/zfs/zpool.cache allowed reboots to avoid such.
END SIDE NOTE.


For reference, for the machine where I can plug in
the media:

# uname -apKU # line split for better readability
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #59
main-n256584-5bc926af9fd1-dirty: Wed Jul  6 18:10:52 PDT 2022
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400063 1400063



===
Mark Millard
marklmi at yahoo.com