ZFS: zpool status on degraded pools (FreeBSD12 vs FreeBSD13)
Date: Wed, 14 Jul 2021 21:10:04 UTC
I'm seeking comments on the following 2 difference in the behavior of ZFS. The first, I consider a bug; the second could be a bug or a conscious choice: 1) Given a pool of 2 disks and one extra disk exactly the same as the 2 pool members (no ZFS labels on the extra disk), power the box off, replace one pool disk with extra disk in the same location; power box back on. The pool is state on FreeBSD13 is ONLINE vs DEGRADED on FreeBSD12: FreeBSD13# zpool status poolXXX pool: poolXXX state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J config: NAME STATE READ WRITE CKSUM poolXXX ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da3 ONLINE 0 0 0 16562597496848792747 UNAVAIL 0 0 0 was /dev/da2 ---- On FreeBSD12: FreeBSD12# zpool status poolXXX pool: poolXXX state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-2Q scan: resilvered 2.01G in 0 days 00:00:36 with 0 errors on Wed Jul 14 17:05:57 2021 config: NAME STATE READ WRITE CKSUM poolXXX DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 da17 ONLINE 0 0 0 5133460438962496754 UNAVAIL 0 0 0 was /dev/da4 errors: No known data errors In the FreeBSD13 case, zpool_get_status() as called from zpool_main.c::status_callback() gets the correct the "reason" for the "status:" line, but health = zpool_get_state_str(zhp) defaults to the state from zhp->zpool_config which ultimately comes from the kernel; the user land logic does not appear semantically different to me on FreeBSD12. -------------------- 2.) Add a spare to a degraded pool and issue a zpool replace to activate the spare. On FreeBSD13 after the resilver is complete, the pool remains degraded until the degraded disk is removed via zpool detach; on Freebsd12, the pool becomes ONLINE when the resilver is complete: FreeBSD13# # zpool status poolXXX pool: poolXXX state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using zpool online' or replace the device with 'zpool replace'. scan: resilvered 2.47G in 00:00:30 with 0 errors on Wed Jul 14 20:23:31 2021 config: NAME STATE READ WRITE CKSUM poolXXX DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 da3 ONLINE 0 0 0 da2 REMOVED 0 0 0 spares da20 AVAIL errors: No known data errors FreeBSD13# zpool replace poolXXX da2 da20 FreeBSD12# zpool status poolXXX pool: poolXXX state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using zpool online' or replace the device with 'zpool replace'. scan: resilvered 2.47G in 00:00:24 with 0 errors on Wed Jul 14 20:39:09 2021 config: NAME STATE READ WRITE CKSUM poolXXX DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 da3 ONLINE 0 0 0 spare-1 DEGRADED 0 0 0 da2 REMOVED 0 0 0 da20 ONLINE 0 0 0 spares da20 INUSE currently in use errors: No known data errors FreeBSD13# zpool detach poolXXX da2 FreeBSD13# zpool status poolXXX pool: poolXXX state: ONLINE scan: resilvered 2.47G in 00:00:24 with 0 errors on Wed Jul 14 20:39:09 2021 config: NAME STATE READ WRITE CKSUM poolXXX ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da3 ONLINE 0 0 0 da20 ONLINE 0 0 0 errors: No known data errors ----- On FreeBSD12: FreeBSD12# zpool status poolXXX pool: poolXXX state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM poolXXX DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 8657568776772252257 REMOVED 0 0 0 was /dev/da1 da5 ONLINE 0 0 0 spares da4 AVAIL errors: No known data errors FreeBSD12# zpool replace poolXXX 8657568776772252257 da4 FreeBSD12# zpool status poolXXX pool: poolXXX state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Jul 14 20:58:14 2021 2.02G scanned at 64.6M/s, 1.70G issued at 54.5M/s, 2.02G total 1.70G resilvered, 84.21% done, 0 days 00:00:06 to go config: NAME STATE READ WRITE CKSUM poolXXX DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 spare-0 REMOVED 0 0 0 8657568776772252257 REMOVED 0 0 0 was /dev/da1 da4 ONLINE 0 0 0 (resilvering) da5 ONLINE 0 0 0 spares 6757612719167571619 INUSE was /dev/da4 errors: No known data errors FreeBSD12# zpool status poolXXX pool: poolXXX state: ONLINE status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 2.01G in 0 days 00:00:43 with 0 errors on Wed Jul 14 20:58:57 2021 config: NAME STATE READ WRITE CKSUM poolXXX ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 spare-0 ONLINE 0 0 0 8657568776772252257 REMOVED 0 0 0 was /dev/da1 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 spares 6757612719167571619 INUSE was /dev/da4 errors: No known data errors ------------------------ autoreplace is off in all pools and zfsd is not running. Any feedback appreciated. -- Dave Baukus