ZFS on Hardware RAID

Mon Jan 21 14:11:26 UTC 2019

> On 19 Jan 2019, at 20:29, jdelisle <jdelisle at gmail.com> wrote:
> 
> You'll find a lot of strong opinions on this topic over at the FreeNAS
> forums.   I too am wish an authoritative, knowledgeable SME would answer
> and thoroughly explain the inner workings and the risks involved.  Most of
> the FreeNAS forum posts on this topic devolve into hand-waving and blurry
> incomplete explanations that end in statements like "trust me, don't do
> it".  I'd love to understand why not.  I'm curious and eager to learn more.

Alright, let me try with one of the many reasons to avoid “hardware RAIDs”.

Disk redundancy in a RAID system offers several benefits. It can not only detect
data corruption (and not all systems are equally good at that) but it can also help
repair it. Of course with adequate redundancy you can survive the loss of one
or several drives.

But as I said not all corruption detection schemes are born equal. ZFS has a very 
sophisticated and effective one, developed as an answer to the increasing size
of the storage systems and the files. Everything has become so large, the probability
of an unnoticed corrupted block is quite high. Some storage systems have suffered
from silent data corruption for instance.

Common “hardware based RAID systems”, which means “software running on
a small, embedded processor” usually have quite limited checksum systems. ZFS has
a much more robust one.

So, now let’s assume we are setting up a server and we have two choices: use the HBA
in “hardware RAID” mode, or use it just as a common HBA relying on ZFS for redundancy.

Option 1: Hardware RAID. Which is the preferred option by many people because, well, 
“hardware” sounds more reliable.

Option 2: ZFS using disks, period. I refuse to use the JBOD term because it’s an added
layer of confusion over what should be a simple subject. 

Now let’s imagine that there’s some data corruption on one of the disks. The corruption is
not detected by the hardware RAID but it’s promptly detected by the more elaborate
ZFS checksum scheme. 

If we chose option 1, ZFS will let us know that there is a corrupted file. But because
redundancy was provided only by the underlying “hardware RAID” ZFS won’t have the 
ability to heal anything. 

Had we chosen option 2, however, and assuming that there was some redundancy, ZFS
would not only report a data corruption incident, but it would return correct data unless 
the blocks were corrupted in several of the disks. 

Assuming that ZFS has a much better error correction/detection than most “hardware
RAID” options (except the high end storage subsystems), running ZFS on a logical volume
built on a “hardware RAID” is roughly equivalent to running it on a single disk with no
redundancy. At least you won’t get a real benefit of the better recovery mechanisms
offered by ZFS.

Do you want another reason? If you use a "hardware RAID” solution you are stuck with it.
In case you suffer a controller failure you will need the same hardware to recover. With ZFS you can
move the disks to a different system with a different HBA. As long as ZFS can access the disks
without any odd stuff it will work regardless of the hardware manufacturer.

There are other important performance reasons related to the handling of data and metadata
but I think that the first reason I mentioned (ZFS error recovery and healing capabilities) is
a strong enough motivation to avoid “hardware RAIDs”

Borja.