Restructure a ZFS Pool

Thu Sep 24 13:31:48 UTC 2015

On Sep 24, 2015, at 8:42, Raimund Sacherer <raimund.sacherer at logitravel.com> wrote:

> I had the pool fill up to over 80%, then I got it back to about 50-60%, but it feels more sluggish. I use a lot of NFS and we use it to backup some 5 million files in lots of sub-directorys (a/b/c/d/abcd...), besides other big files (SQL dump backups, bacula, etc.)
> 
> I said above sluggish because I do not have empirical data and I do not know exactly how to test the system correctly, but I read a lot and there seem to be suggestions that if you have NFS etc. that a independent ZIL helps with copy-on-write fragmentation. 

A SLOG (Separate Log Device) will not remove existing fragmentation, but it will help prevent future fragmentation _iff_ (if and only if) the write operations are synchronous. NFS is not, by itself, sync, but the write calls on the client _may_ be sync.

> What I would like to know is if I can eliminate one Spare disk from the pool, and add it as a ZIL again, without having to shutdown/reboot the server?

Yes, but unless you can stand loosing data in flight (writes that the system says have been committed but have only made it to the SLOG), you really want your SLOG vdev to be a mirror (at least 2 drives).

> I am also thinking about swapping the spare 4TB disk for a small SSD, but that's immaterial to whether I can perform the change. 

I assume you want to swap instead of just add due to lack of open drive slots / ports.

In a zpool of this size, especially a RAIDz<N> zpool, you really want a hot spare and a notification mechanism so you can replace a failed drive ASAP. The resilver time (to replace afield drive) will be limited by the performance of a _single_ drive for _random_ I/O. See this post http://pk1048.com/zfs-resilver-observations/ for one of my resilver operations and the performance of such.

> Also I would appreciate it if someone has some pointers on how to test correctly so I see if there are real benefits before/after this operation.

I use a combination of iozone and filebench to test, but first I characterize my workload. Once I know what my workload looks like I can adjust the test parameters to match the workload. If the test results do not agree with observed behavior, then I tune them until they do. Recently I needed to test a server before going live. I knew the workload was NFS for storing VM images. So I ran iozone with 8-64 GB files and 4 KB to 1 MB blocks, and sync writes (the -o option). The measurements matched very closely to the observations, so I knew could trust them and any changes I made would give me valid results.

--
Paul Kraus
paul at kraus-haus.org