ZFS...
Freddie Cash
fjwcash at gmail.com
Wed May 8 16:54:07 UTC 2019
On Wed, May 8, 2019 at 9:31 AM Karl Denninger <karl at denninger.net> wrote:
> I have a system here with about the same amount of net storage on it as
> you did. It runs scrubs regularly; none of them take more than 8 hours
> on *any* of the pools. The SSD-based pool is of course *much* faster
> but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it
> kicks off automatically at 2:00 AM when the time comes but is complete
> before noon. I run them on 14 day intervals.
>
Damn, I wish our scrubs took 8 hours. :)
Storage pool 1: 90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA). 45 hours to scrub.
Storage pool 2: 90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA). 33 hours to scrub.
Storage pool 3: 24 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA). 134 hours to scrub.
Storage pool 4: 24 drives in 6-disk raidz2 vdevs (mix of 1 TB, 2 TB, 4 TB
SATA). Dedupe enabled. 256 hours to scrub.
Storage pool 5: 90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB
SATA). Dedupe enabled. Takes about 6 weeks to resilver a drive, and it's
constantly resilvering drives these days as it's the oldest pool, and all
the drives are dying.
:D
Pools 1, 3, and 4 are in DC1. Pools 2 and 5 are in DC2 across town.
Pool 1 sends snapshots to pool 2. Pools 3 and 4 send snapshots to pool 5.
These pools are highly fragmented. :)
> If you have pool(s) that are taking *two weeks* to run a scrub IMHO
> either something is badly wrong or you need to rethink organization of
> the pool structure -- that is, IMHO you likely either have a severe
> performance problem with one or more members or an architectural problem
> you *really* need to determine and fix. If a scrub takes two weeks
> *then a resilver could conceivably take that long as well* and that's
> *extremely* bad as the window for getting screwed is at its worst when a
> resilver is being run.
>
Thankfully, ours are strictly storage for backups of other systems, so as
long as the nightly backups complete successfully before 6 am, we're not
worried about performance. :) And we do have plans to replace pools 2 and
5 to remove dedupe from the equation. There's not a lot we can do about
the fragmentation issue, as these servers all run rsync backups from
200-odd other servers, and remove the oldest snapshot every night.
So, while a 2-week scrub may be horrible, it all depends on the use-case.
If these were direct storage systems for in-production servers, then I'd be
worried. But as redundant backup systems (3 copies of everything, in 3
separate locations around the city), I'm not too worried. Yet. :D
--
Freddie Cash
fjwcash at gmail.com
More information about the freebsd-stable
mailing list