Adding to a zpool -- different redundancies and risks

Fri Dec 13 04:50:03 UTC 2019

On 2019-12-12 04:42, Norman Gray wrote:
> 
> David, hello.

Hi.  :-)

> On 12 Dec 2019, at 5:11, David Christensen wrote:
> 
>> Please post:
>>
>> 1   The 'zpool create ...' command you used to create the existing
>> pool.

On 2019-12-12 06:33, Norman Gray wrote:
 > # zpool history pool
 > History for 'pool':
 > 2017-08-20.15:45:43 zpool create -m /pool pool raidz2 da2 da3 da4 da5
 > da6 da7 da8 da9 da10 raidz2 da11 da12 da13 da14 da15 da16 da17 da18 da19

Okay.

On 2019-12-12 04:42, Norman Gray wrote:
>> 2.  The output of 'zpool status' for the existing pool.
> 
> # zpool status pool
>     pool: pool
>    state: ONLINE
> status: Some supported features are not enabled on the pool. The pool
> can
> 	still be used, but some features are unavailable.
> action: Enable all features using 'zpool upgrade'. Once this is done,
> 	the pool may no longer be accessible by software that does not support
> 	the features. See zpool-features(7) for details.
>     scan: none requested
> config:
> 
> 	NAME             STATE     READ WRITE CKSUM
> 	pool             ONLINE       0     0     0
> 	  raidz2-0       ONLINE       0     0     0
> 	    label/zd032  ONLINE       0     0     0
> 	    label/zd033  ONLINE       0     0     0
> 	    label/zd034  ONLINE       0     0     0
> 	    label/zd035  ONLINE       0     0     0
> 	    label/zd036  ONLINE       0     0     0
> 	    label/zd037  ONLINE       0     0     0
> 	    label/zd038  ONLINE       0     0     0
> 	    label/zd039  ONLINE       0     0     0
> 	    label/zd040  ONLINE       0     0     0
> 	  raidz2-1       ONLINE       0     0     0
> 	    label/zd041  ONLINE       0     0     0
> 	    label/zd042  ONLINE       0     0     0
> 	    label/zd043  ONLINE       0     0     0
> 	    label/zd044  ONLINE       0     0     0
> 	    label/zd045  ONLINE       0     0     0
> 	    label/zd046  ONLINE       0     0     0
> 	    label/zd047  ONLINE       0     0     0
> 	    label/zd048  ONLINE       0     0     0
> 	    label/zd049  ONLINE       0     0     0
> 
> errors: No known data errors
> #
> 
> (Note: since creating the pool, I realised that gpart labels were a Good
> Thing, hence exported, labelled, and imported the pool, hence the
> difference from the da* pool creation).

So, two raidz2 vdev's of nine 5.5 TB drives each, striped into one pool. 
  Each vdev can store 7 * 5.5 = 38.5 TB and the pool can store 38.5 + 
38.5 = 77 TB.

>> 3.  The output of 'zpool list' for the existing pool.
> 
> # zpool list pool
> NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP
> HEALTH  ALTROOT
> pool    98T  75.2T  22.8T        -         -    29%    76%  1.00x
> ONLINE  -

So, your pool is 75.2 TB / 77 TB = 97.7% full.

>> 4.  The 'zpool add ...' command you are contemplating.
> 
> # zpool add -n pool raidz2 label/zd05{0,1,2,3,4,5}
> invalid vdev specification
> use '-f' to override the following errors:
> mismatched replication level: pool uses 9-way raidz and new vdev uses
> 6-way raidz

I believe your understanding of the warning is correct -- ZFS is saying 
that the added raidz2 vdev does not having the same number of drives 
(six) as the two existing raidz2 vdev's (nine drives each).

> The six new disks are 12TB; the 18 original ones 5.5TB.

As you stated before.

>> So, you have 24 drives in a 24 drive cage?
> 
> That's correct -- the maximum the chassis will take.

Okay.

>> What are your space and performance goals?
> 
> Not very explicit: TB/currency-unit as high as possible.  Performance:
> bottlenecks are likely to be elsewhere (network, processing power) so no
> stringent requirements.  Though this is a fairly general-purpose data
> store, a large fraction of the datasets on the machine comprise a number
> of 10GB single files, served via NFS.
> 
>> What are your sustainability goals as drives and/or VDEV's fail?
> 
> It doesn't have to be high availability, so if I have a drive failure, I
> can consider shutting the machine down until a replacement disk arrives
> and can be resilvered.  This is a mirror of data where the masters are
> elsewhere on the planet, so this machine is 'reliable storage but not
> backed up' (and the users know this).  Thus if I do decide to keep
> running with one failed disk in one VDEV, and the worst comes to the
> worst and the whole thing explodes... the world won't end.  I will be
> cross, and users will moan, in either case, but they know this is a
> problem that can fundamentally be solved with more money.
> 
> I'm sure I could be more sophisticated about this (and any suggestions
> are welcome), but unfortunately I don't have as much time to spend on
> storage problems as I'd like, so I'd like to avoid creating a setup
> which is smarter than I'm able to fix!
> 
> Best wishes,
> 
> Norman

Okay.

I believe that if you gave the -f option to 'zfs add', the six 12 TB 
drives would be formed into a raidz2 vdev and this new vdev would be 
striped onto your existing pool.  The pool would then have a total 
capacity of 38.5 + 38.5 + 48 = 125 TB and ZFS would start spreading your 
data across the three vdev's (potentially improving performance under 
concurrent workloads).  The pool could withstand two drive failures in 
any single raidz2 vdev, but three drive failures in the same vdev would 
result in total data loss.

That said, read this article:

https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/

So:

1.  Pair the eighteen 5.5 TB drives into nine 5.5 TB mirrors (49.5 TB).

2.  Pair the six 12 TB drives into three 12 TB mirrors (36 TB).

3.  Stripe all the mirrors into a pool (85.5 TB).

The pool could withstand one drive failure in any single mirror, but two 
drive failures in the same mirror would result in total data loss.  The 
risk of total loss would be especially apparent from the time a drive 
fails until the time its replacement is resilvered.

AIUI this architecture has another benefit -- incremental pool growth. 
You replace one 5.5 TB drive in a mirror with a 12 TB drive, resilver, 
replace the other 5.5 TB drive in the same mirror with another 12 TB 
drive, resilver, and now the pool is 6.5 TB larger.  In the long run, 
you end up with twenty-four 12 TB drives (144 TB pool).  The process 
could then be repeated (or preempted) using even bigger drives.

David