ZFS and "internal" mfi RAID cards, the eternal question

Borja Marcos borjam at sarenet.es
Fri Jan 14 11:12:07 UTC 2011


Hello,

I've been using FreeBSD  with ZFS for some time, with varied degrees of success. Unfortunately, turns out that Dell, which is the preferred  manufacturer here, doesn't offer an appropiate configuration.

They insist on selling those "RAID cards" that of course get in the way of running ZFS in the right way. With these cards you are left basicly with two options: either defining a large RAID5 logical volume and use ZFS as a file system, or define a RAID0 volume for each physical disk.

I don't like the RAID5 option because you really lose some of the interesting ZFS functionality. Also, I'm not sure how robust it would be.

(I've done the following tests with a Dell PERC H700)

And the RAID0 volumes "solution" is not valid. The RAID card stays between the disks and ZFS, and some things simply don't work. For example, if you remove one of the disks (or it experiences a failure) the RAID card will cry foul and disable the RAID0 volume on it completely. Once this happens, there's no way to replace the disk or put in online again. You must reboot the machine and go through the RAID configuration, including an "IMPORT FOREIGN CONFIGURATION". It's plain insane.

And there's another option I've been using on one machine recently, with some success. When I asked about  these problems long ago, Scott Long gave me instructions to patch mfi_cam.c so that I can see the disks as SAS targets. As it had a "beware, at your own risk" warning attached, I limited it to a single machine. And, so far so good, although I must remember to repatch the mfi_cam.c file whenever I update the system.

There are indeed some problems using it. Of course I avoided to define any logical volumes on the RAID card, assuming that it wouldn't touch them. At least it senses them, and places them in "UNCONFIGURED_GOOD" state. I haven seen also interference (unexpected sense status) if I try, for example, to put a disk offline with "camcontrol stop". I suspect that there might be other problems.

But so far I have had no problems with ZFS running in this way. Today I've been trying a new machine  (with a PERC H700) and I've been doing some silly tests like removing a disk, swapping them (I am using gpt labels for the disks so that ZFS can recognize them regardless of controller/backplane position) and I haven't managed to damage the ZFS pool. Everything worked like a charm.

Two problems remain with the mfip approach, though. First, it's risky in the present state. And I'm not sure if future versions of the mfi firmware could cause problems by touching disks I'm accessing 
through mfip. Second, there is some problem with the driver shutdown. If I try to shutdown or reboot the machine (I mean, good old command-line shutdown -r or shutdown -p) the driver gets stuck and the machine requires a manual power-down.

Would it be possible to have this option better integrated, so that it would export disks not being actually used by the RAID subsystem (I assume that means disks in UNCONFIGURED_GOOD state) and, of course, making it possible to reboot/halt the system without problems.

Maybe there would be a way to just tell the card firmware to shut up and let it work like a simple  and well behaved SAS card? 





Borja.



More information about the freebsd-scsi mailing list