Re: ZFS: zpool status on degraded pools (FreeBSD12 vs FreeBSD13)
Date: Wed, 14 Jul 2021 22:17:05 UTC
On 7/14/2021 17:45, Dave Baukus wrote: > On 7/14/21 3:21 PM, Alan Somers wrote: > This message originated outside your organization. > ________________________________ > On Wed, Jul 14, 2021 at 3:10 PM Dave Baukus <daveb@spectralogic.com<mailto:daveb@spectralogic.com>> wrote: > I'm seeking comments on the following 2 difference in the behavior of ZFS. > The first, I consider a bug; the second could be a bug or a conscious choice: > > 1) Given a pool of 2 disks and one extra disk exactly the same as the 2 pool members (no ZFS labels on the extra disk), > power the box off, replace one pool disk with extra disk in the same location; power box back on. > > The pool is state on FreeBSD13 is ONLINE vs DEGRADED on FreeBSD12: > > I agree, the FreeBSD 13 behavior seems like a bug. > > 2.) Add a spare to a degraded pool and issue a zpool replace to activate the spare. > On FreeBSD13 after the resilver is complete, the pool remains degraded until the degraded disk > is removed via zpool detach; on Freebsd12, the pool becomes ONLINE when the resilver is complete: > > I agree. I think I prefer the FreeBSD 13 behavior, but either way is sensible. > > The change is no doubt due to the OpenZFS import in FreeBSD 13. Have you tried to determine the responsible commits? They could be regressions in OpenZFS, or they could be bugs that we fixed in FreeBSD but never upstreamed. > -Alan > > Thanks for the feedback Alan. I have not yet dug into #1 beyond zpool, lib[zpool|zfs]. > > -- > > Dave Baukus IMHO.... (12.2-STABLE) root@NewFS:/home/karl # zpool status backup pool: backup state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 0 in 0 days 09:25:28 with 0 errors on Wed Jun 30 12:33:35 2021 config: NAME STATE READ WRITE CKSUM backup DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gpt/backup8.eli ONLINE 0 0 0 9628424513629875622 OFFLINE 0 0 0 was /dev/gpt/backup8-1.eli gpt/backup8-2.eli ONLINE 0 0 0 errors: No known data errors This is IMHO correct behavior. I do this intentionally; the other disk is physically offsite. When I go to swap them I take 8-2 offline, remove it, go swap it with 8-1, bring 8-1 online. The mirror resilvers but, when its done it still shows "DEGRADED" because it is. It has three members and one was (deliberately) removed and is not in the building. This is the last-ditch, building-burned-down (or similar catastrophe) offsite backup of course. That pool is normally exported except when synchronizing using zfs send/recv. If one of the other two fails (they are subject to a routine scrub schedule) then when a do a "replace" on it, when it finishes, the pool is *still* degraded. The only time it would not be is if all three disks are physically in the machine at once, which is not something I usually do for obvious reasons. The exception is when I need to make that pool larger; then they all have to be here since all three members have to be present and online for the expand to work. So long as at least *one* of the three mirror members has not been destroyed/damaged and is intact I still have a fully-functional backup from which the running system data sets can be restored. If -13 would show that configuration "ONLINE" then IMHO what it is reporting is broken; there is a missing member in the mirror set, albeit in this case intentionally. -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/