Gmirror/graid or hardware raid?
Paul Kraus
paul at kraus-haus.org
Thu Jul 9 14:32:48 UTC 2015
On Jul 8, 2015, at 17:21, Charles Swiger <cswiger at mac.com> wrote:
> On Jul 8, 2015, at 12:49 PM, Mario Lobo <lobo at bsd.com.br> wrote:
<snip>
> Most of the PROD databases I know of working from local storage have heaps of
> RAID-1 mirrors, and sometimes larger volumes created as RAID-10 or RAID-50.
> Higher volume shops use dedicated SAN filers via redundant Fibre Channel mesh
> or similar for their database storage needs.
Many years ago I had a client buy a couple racks FULL of trays of 36 GB SCSI drives (yes, it was that long ago) and partition them so that they only used the first 1 GB of each. This was purely for performance. They were running a relatively large Oracle database and lots of OLTP transactions.
>
>> I thought about zfs but I won't have lots of RAM avaliable
>
> ZFS wants to be run against bare metal. I've never seen anyone setup ZFS within
> a VM; it consumes far too much memory and it really wants to talk directly to the
> hardware for accurate error detection.
ZFS runs fine in a VM and the notion that it _needs_ lots of RAM is mostly false. I have run a FBSD Guest with ZFS and only 1 GB RAM.
But… ZFS is designed first and foremost for data reliability and not performance. It gets it’s performance from striping across many vdevs (the ZFS term for the top level device you assemble zpools out of), the ARC (adaptive reuse cache), and Logging devices. Striping requires many drives. The ARC uses any available RAM as a very aggressive FS cache. The Log device improves sync writes by committing them to a dedicated log device (usually a mirror of fast SSDs).
I generally use ZFS for the Host (and because of my familiarity with ZFS, I tend to use ZFS for all of the Host filesystems). Then I use UFS for the Guests _unless_ I might need to migrate data in or out of a VM or I need flexibility in partitioning (once you build a zpool, all zfs datasets in it can grab as much or as little space as they need). I can use zfs send / recv (even incrementally) to move data around quickly and easily. I generally turn on compression for VM datasets (I set up one zfs dataset per VM) as the CPU cost is noise and it dazes a bunch of space (and reduces physical disk I/O which also improves performance). I do NOT turn on compression in any ZFS inside a Guest as I am already compressing at the Host layer.
I also have a script that grabs a snapshot of every ZFS dataset every hour and replicates them over to my backup server. Since ZFS snapshots have no performance penalty, the only cost to keep them around is the space used. This has proven to be a lifesaver when a Guest is corrupted, I can easily and quickly roll it back to the most recent clean version.
>
>> Should I use the controller raid? Gmirror/Graid? What raid level?
>
> Level is easy: a 4-disk machine is suited for either a pair of RAID-1s, a 4-disk RAID-10 volume,
> or a 4-disk RAID-5 volume.
For ZFS, the number of vdev’s and the type will determine performance. For a vdev of the following type you can expect the listed performance. I am listing performance in terms of comparison to a single disk.
N-way mirror: write 1x, read 1*n
RaidZ: write 1x, read 1x minimum but variable
Note that the performance of a RaidZ vdev does NOT scale with the number of drives in the RAID set nor does it change with the Raid level (Z1, Z2, Z3).
So for example, a zpool consisting of 4 vdevs each a 2-way mirror will have 4x the write performance of a single drive and 8x the read performance. A zpool consisting of 2 vdevs each a RaidZ2 of 4 drives will have the 2x the write performance of single drive and the read performance will be a minimum of 2 x the performance of a single drive. The variable read performance of RaidZ is because RaidZ does not always write full strips across all the drives in the vdev. In other words, RaidZ is a variable width Raid system. This has advantages and disadvantages :-) Here is a good blog post that describes the RaidZ stripe width http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/
I do NOT use RaidZ for anything except bulk backup data where capacity is all that matters and performance is limited by lots of other factors.
I also create a “do-not-remove” dataset in every zpool with 1 GB reserved and quota. ZFS behaves very, very badly when FULL. This give me a cushion when things go badly so I can delete whatever used up all the space … Yes, ZFS cannot delete files if the FS is completely FULL. I leave the “do-not-remove” dataset unmounted so that it cannot be used.
Here is the config of my latest server (names changed to protect the guilty):
root at host1:~ # zpool status
pool: rootpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rootpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada2p3 ONLINE 0 0 0
ada3p3 ONLINE 0 0 0
errors: No known data errors
pool: vm-001
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
vm-001 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
diskid/DISK-WD-WMAYP2681136 ONLINE 0 0 0
diskid/DISK-WD-WMAYP3653359 ONLINE 0 0 0
errors: No known data errors
root at host1:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
rootpool 35.0G 383G 19K none
rootpool/ROOT 3.79G 383G 19K none
rootpool/ROOT/2015-06-10 1K 383G 3.01G /
rootpool/ROOT/default 3.79G 383G 3.08G /
rootpool/do-not-remove 19K 1024M 19K none
rootpool/software 18.6G 383G 18.6G /software
rootpool/tmp 4.29G 383G 4.29G /tmp
rootpool/usr 3.98G 383G 19K /usr
rootpool/usr/home 19K 383G 19K /usr/home
rootpool/usr/ports 3.63G 383G 3.63G /usr/ports
rootpool/usr/src 361M 383G 359M /usr/src
rootpool/var 3.20G 383G 19K /var
rootpool/var/crash 19K 383G 19K /var/crash
rootpool/var/log 38.5M 383G 1.19M /var/log
rootpool/var/mail 42.5K 383G 30.5K /var/mail
rootpool/var/tmp 19K 383G 19K /var/tmp
rootpool/var/vbox 3.17G 383G 2.44G /var/vbox
vm-001 166G 283G 21K /vm/local
vm-001/aaa-01 61.1G 283G 17.0G /vm/local/aaa-01
vm-001/bbb-dev-01 20.8G 283G 13.1G /vm/local/bbb-dev-01
vm-001/ccc-01 21.5K 283G 20.5K /vm/local/ccc-01
vm-001/dev-01 4.10G 283G 3.19G /vm/local/dev-01
vm-001/do-not-remove 19K 1024M 19K none
vm-001/ddd-01 4.62G 283G 2.26G /vm/local/ddd-01
vm-001/eee-dev-01 16.6G 283G 15.7G /vm/local/eee-dev-01
vm-001/fff-01 7.44G 283G 3.79G /vm/local/fff-01
vm-001/ggg-02 2.33G 283G 1.77G /vm/local/ggg-02
vm-001/hhh-02 8.99G 283G 6.80G /vm/local/hhh-02
vm-001/iii-repos 36.2G 283G 36.2G /vm/local/iii-repos
vm-001/test-01 2.63G 283G 2.63G /vm/local/test-01
vm-001/jjj-dev-01 19K 283G 19K /vm/local/jjj-dev-01
root at host1:~ #
--
Paul Kraus
paul at kraus-haus.org
More information about the freebsd-questions
mailing list