ZFS...

Wed May 8 16:54:07 UTC 2019

On Wed, May 8, 2019 at 9:31 AM Karl Denninger <karl at denninger.net> wrote:

> I have a system here with about the same amount of net storage on it as
> you did.  It runs scrubs regularly; none of them take more than 8 hours
> on *any* of the pools.  The SSD-based pool is of course *much* faster
> but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it
> kicks off automatically at 2:00 AM when the time comes but is complete
> before noon.  I run them on 14 day intervals.
>

Damn, I wish our scrubs took 8 hours.  :)

Storage pool 1:  90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA).  45 hours to scrub.

Storage pool 2:  90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA).  33 hours to scrub.

Storage pool 3:  24 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA).  134 hours to scrub.

Storage pool 4:  24 drives in 6-disk raidz2 vdevs (mix of 1 TB, 2 TB, 4 TB
SATA).  Dedupe enabled.  256 hours to scrub.

Storage pool 5:  90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA).  Dedupe enabled.  Takes about 6 weeks to resilver a drive, and it's
constantly resilvering drives these days as it's the oldest pool, and all
the drives are dying.

:D

Pools 1, 3, and 4 are in DC1.  Pools 2 and 5 are in DC2 across town.

Pool 1 sends snapshots to pool 2.  Pools 3 and 4 send snapshots to pool 5.

These pools are highly fragmented.  :)

> If you have pool(s) that are taking *two weeks* to run a scrub IMHO
> either something is badly wrong or you need to rethink organization of
> the pool structure -- that is, IMHO you likely either have a severe
> performance problem with one or more members or an architectural problem
> you *really* need to determine and fix.  If a scrub takes two weeks
> *then a resilver could conceivably take that long as well* and that's
> *extremely* bad as the window for getting screwed is at its worst when a
> resilver is being run.
>

Thankfully, ours are strictly storage for backups of other systems, so as
long as the nightly backups complete successfully before 6 am, we're not
worried about performance.  :)  And we do have plans to replace pools 2 and
5 to remove dedupe from the equation.  There's not a lot we can do about
the fragmentation issue, as these servers all run rsync backups from
200-odd other servers, and remove the oldest snapshot every night.

So, while a 2-week scrub may be horrible, it all depends on the use-case.
If these were direct storage systems for in-production servers, then I'd be
worried.  But as redundant backup systems (3 copies of everything, in 3
separate locations around the city), I'm not too worried.  Yet.  :D

-- 
Freddie Cash
fjwcash at gmail.com