Re: How to get past "internal error: cannot import 'zroot': Integrity check failed" (no ability to import the pool)?
Date: Sun, 28 Aug 2022 00:18:09 UTC
On 2022-Aug-24, at 20:57, Mark Millard <marklmi@yahoo.com> wrote: > I seem to have gotten into a state where no zpool related > command that required identification of a pool (such as > by name) can work because import can not make the zpool > available. (I give more context later.) > > How do I re-establish the freebsd-zfs partition into > a form that I can repopulate it when its failed pool > can not be imported? I'm appearently limited to > zpool commands that reference the device instead of the > pool (name) because the likes of "zpool import -f -FX > . . ." leads to a panic. > > Note that this was for media that used zfs just to use > bectl, not for other typical zfs reasons. For example, > redundancy was-not/is-not a goal. For reference: > > => 40 3907029088 da0 GPT (1.8T) > 40 32728 - free - (16M) > 32768 524288 1 efi (256M) > 557056 7340032 2 freebsd-swap (3.5G) > 7897088 26214400 - free - (13G) > 34111488 20971520 3 freebsd-swap (10G) > 55083008 12582912 - free - (6.0G) > 67665920 29360128 4 freebsd-swap (14G) > 97026048 4194304 - free - (2.0G) > 101220352 33554432 5 freebsd-swap (16G) > 134774784 67108864 6 freebsd-swap (32G) > 201883648 364904448 7 freebsd-swap (174G) > 566788096 2795503616 8 freebsd-zfs (1.3T) > 3362291712 544737416 - free - (260G) > > At this point no attempt to preserve the content of > the freebsd-zfs partition seems a likely way of going. > But I'm unclear on how to even start over, given no > ability to make the pool accessible by name. > > The sequence leading to how things are went like . . . > > # git -C /usr/ports fetch > error: error reading from .git/objects/pack/pack-8e819c78469vm_fault: pager read error, pid 1370 (git) > c212148fe5d3922cc807e6858768e.pack: Input/output error > vm_fault: pager read error, pid 1370 (git) > . . . > > # bectl activate main-CA72 > panic: VERIFY3(0 == bpobj_open(&bpo, dl->dl_os, dlce->dlce_bpobj)) failed (0 == 97) > > cpuid = 0 > time = 1661389515 > KDB: stack backtrace: > db_trace_self() at db_trace_self_wrapper+0x30 > pc = 0xffff0000007fcfd0 lr = 0xffff000000101b80 > sp = 0xffff0000b49f4ee0 fp = 0xffff0000b49f50e0 > > db_trace_self_wrapper() at vpanic+0x178 > pc = 0xffff000000101b80 lr = 0xffff0000004cef08 > sp = 0xffff0000b49f50f0 fp = 0xffff0000b49f5150 > > vpanic() at spl_panic+0x40 > pc = 0xffff0000004cef08 lr = 0xffff00000129f360 > sp = 0xffff0000b49f5160 fp = 0xffff0000b49f51f0 > > spl_panic() at dsl_deadlist_space_range+0x264 > pc = 0xffff00000129f360 lr = 0xffff00000133d4f4 > sp = 0xffff0000b49f5200 fp = 0xffff0000b49f53c0 > > dsl_deadlist_space_range() at snaplist_space+0x4c > pc = 0xffff00000133d4f4 lr = 0xffff00000133899c > sp = 0xffff0000b49f53d0 fp = 0xffff0000b49f5440 > > snaplist_space() at dsl_dataset_promote_check+0x648 > pc = 0xffff00000133899c lr = 0xffff0000013385e8 > sp = 0xffff0000b49f5450 fp = 0xffff0000b49f5530 > > dsl_dataset_promote_check() at dsl_sync_task_sync+0xcc > pc = 0xffff0000013385e8 lr = 0xffff00000135fb3c > sp = 0xffff0000b49f5540 fp = 0xffff0000b49f5590 > > dsl_sync_task_sync() at dsl_pool_sync+0x3cc > pc = 0xffff00000135fb3c lr = 0xffff00000135251c > sp = 0xffff0000b49f55a0 fp = 0xffff0000b49f55e0 > > dsl_pool_sync() at spa_sync+0x8f8 > pc = 0xffff00000135251c lr = 0xffff00000138be28 > sp = 0xffff0000b49f55f0 fp = 0xffff0000b49f57f0 > > spa_sync() at txg_sync_thread+0x1d8 > pc = 0xffff00000138be28 lr = 0xffff0000013a2bf8 > sp = 0xffff0000b49f5800 fp = 0xffff0000b49f58f0 > > txg_sync_thread() at fork_exit+0x88 > pc = 0xffff0000013a2bf8 lr = 0xffff00000047c568 > sp = 0xffff0000b49f5900 fp = 0xffff0000b49f5950 > > fork_exit() at fork_trampoline+0x14 > pc = 0xffff00000047c568 lr = 0xffff00000081edd4 > sp = 0xffff0000b49f5960 fp = 0x0000000000000000 > > KDB: enter: panic > [ thread pid 4 tid 100198 ] > Stopped at kdb_enter+0x48: undefined f907011f > > I was unable to boot from the media after this. > > Plugged the media into another machine . . . > > # zpool import -F -n zroot > cannot import 'zroot': pool was previously in use from another system. > Last accessed by <unknown> (hostid=0) at Wed Aug 24 18:05:15 2022 > The pool can be imported, use 'zpool import -f' to import the pool. > > # zpool import -f zroot > Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1588]: failed to load zpool zroot > Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1612]: failed to load zpool zroot > Aug 24 18:13:47 CA72_16Gp_ZFS ZFS[1616]: failed to load zpool zroot > internal error: cannot import 'zroot': Integrity check failed > Abort trap (core dumped) > > # gdb zpool zpool.core > . . . > Core was generated by `zpool import -f zroot'. > Program terminated with signal SIGABRT, Aborted. > Sent by thr_kill() from pid 1716 and user 0. > #0 thr_kill () at thr_kill.S:4 > 4 RSYSCALL(thr_kill) > (gdb) bt > #0 thr_kill () at thr_kill.S:4 > #1 0x00002b0c6f3794f0 in __raise (s=s@entry=6) at /usr/main-src/lib/libc/gen/raise.c:52 > #2 0x00002b0c6f420494 in abort () at /usr/main-src/lib/libc/stdlib/abort.c:67 > #3 0x00002b0c69415744 in zfs_verror (hdl=0x2b0c76263000, error=2092, fmt=fmt@entry=0x2b0c693d3135 "%s", ap=...) at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:344 > #4 0x00002b0c69416324 in zpool_standard_error_fmt (hdl=hdl@entry=0x2b0c76263000, error=error@entry=97, fmt=0x2b0c693d3135 "%s") at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:729 > #5 0x00002b0c69415ec8 in zpool_standard_error (hdl=0x0, hdl@entry=0x2b0c76263000, error=0, error@entry=97, msg=0x2b0c6ea23350 <__thr_sigprocmask> "\377\203", > msg@entry=0x2b0c665668e8 "cannot import 'zroot'") at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_util.c:619 > #6 0x00002b0c6940687c in zpool_import_props (hdl=0x2b0c76263000, config=config@entry=0x2b0c95939080, newname=newname@entry=0x0, props=props@entry=0x0, flags=flags@entry=2) > at /usr/main-src/sys/contrib/openzfs/lib/libzfs/libzfs_pool.c:2193 > #7 0x00002b0be60f3344 in do_import (config=0x2b0c95939080, newname=0x0, mntopts=0x0, props=props@entry=0x0, flags=flags@entry=2) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3190 > #8 0x00002b0be60f3108 in import_pools (pools=pools@entry=0x2b0c762780e0, props=<optimized out>, mntopts=mntopts@entry=0x0, flags=flags@entry=2, orig_name=0x2b0c7622d028 "zroot", new_name=0x0, > do_destroyed=do_destroyed@entry=B_FALSE, pool_specified=pool_specified@entry=B_TRUE, do_all=B_FALSE, import=0x2b0c665684a0) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3318 > #9 0x00002b0be60e9074 in zpool_do_import (argc=1, argv=<optimized out>) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:3804 > #10 0x00002b0be60e3ce8 in main (argc=4, argv=<optimized out>) at /usr/main-src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:10918 > (gdb) quit > > > # zpool import -f -FX -N -R /zroot-mnt -t zroot zprpi > . . . evantually . . . > panic: Solaris(panic): zfs: adding existent segment to range tree (offset=7a001ba000 size=40000) > cpuid = 8 > time = 1661395806 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x13c > panic() at panic+0x44 > vcmn_err() at vcmn_err+0x10c > zfs_panic_recover() at zfs_panic_recover+0x64 > range_tree_add_impl() at range_tree_add_impl+0x184 > range_tree_walk() at range_tree_walk+0xa4 > metaslab_load() at metaslab_load+0x6a4 > metaslab_preload() at metaslab_preload+0x8c > taskq_run() at taskq_run+0x1c > taskqueue_run_locked() at taskqueue_run_locked+0x190 > taskqueue_thread_loop() at taskqueue_thread_loop+0x130 > fork_exit() at fork_exit+0x88 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > [ thread pid 6 tid 108968 ] > Stopped at kdb_enter+0x44: undefined f907c27f > db> > > I had to unplug the disk to avoid reboots simply retrying > the import and crashing the same way again. > > SIDE NOTE: Then, on reboot, I saw the following: > . . . > Setting hostid: 0x6522bfc4. > cannot import 'zroot': no such pool or dataset > Destroy and re-create the pool from > a backup pid 49 (zpool) is attempting to use unsafe AIO requests - not logging anymore > pid 49 (zpool), jid 0, uid 0: exited on signal 6 > source. > cachefile import failed, retrying > nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0) > ASSERT at /usr/main-src/sys/contrib/openzfs/module/nvpair/fnvpair.c:592:fnvpair_value_nvlist()Abort trap > Import of zpool cache /etc/zfs/zpool.cache failed, will retry after root mount hold release > cannot import 'zroot': no such pool or dataset > Destroy and re-create the pool from > a backup source. > cachefile imporpid 55 (zpool), jid 0, uid 0: exited on signal 6 > t failed, retrying > nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0) > ASSERT at /usr/main-src/sys/contrib/openzfs/module/nvpair/fnvpair.c:592:fnvpair_value_nvlist()Abort trap > Starting file system checks: > /dev/gpt/CA72opt0EFI: 281 files, 231 MiB free (14770 clusters) > FIXED > . . . > > Removing /etc/zfs/zpool.cache allowed reboots to avoid such. > END SIDE NOTE. > > > For reference, for the machine where I can plug in > the media: > > # uname -apKU # line split for better readability > FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #59 > main-n256584-5bc926af9fd1-dirty: Wed Jul 6 18:10:52 PDT 2022 > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 > arm64 aarch64 1400063 1400063 > I ended up doing: # zpool labelclear -f /dev/da0p8 # zpool create -o compatibility=openzfs-2.1-freebsd -O compress=lz4 -O atime=off -f -tzprpi zroot /dev/da0p8 # zpool export zprpi and then doing my normal update procedure to the media (sends and some adjustments), making the zroot a variant of what is on another machine (under different pool name). And . . . The media is back to being importable, even bootable, media. === Mark Millard marklmi at yahoo.com