To JBOD or just to pass, that is the question
Borja Marcos
borjam at sarenet.es
Wed Jan 25 11:33:51 UTC 2012
Hi
Sorry for the cross-posting, but this pertains both filesystems and host adapters.
Since ZFS was added to FreeBSD it has collided with the standard practice of server manufacturers: including "RAID" cards.
ZFS is designed to control the disks directly, and RAID cards tend to get in the middle, complicating things. Some cards do provide access to disks, working as plain host adapters without more added features, as they should be. But so far I have found two problematic cases: mfi and aac cards.
Of course, the standard suggestion is to create a logical volume for each disk, so that you have the rough equivalent of a hard disk attached to a host adapter. However, it has its drawbacks:
- Added layer of complexity. At least with mfi cards, replacing a broken disk involves a bit of device dependant voodoo incantations. It should be a matter of physically replacing the disk and maybe do a camcontrol rescan, nothing else.
- Are such volume-per-disk transportable from one controller to another? What happens if I need to install them on a different machine with a different host adapter? ZFS provides that interoperability, but the RAID cards can be a problem.
- More complexity: What's, for instance, the caching behavior of the RAID card? ZFS decides when to flush, not to flush, etc. Battery backed RAID cards show (as far as I know) a configuration dependent caching, maybe ignoring commands received from the OS storage subsystem? At least there's no detailed documentation as far as I know. So I tend to dislike that "firmware in the middle".
Long ago I asked for help on freebsd-scsi and Scott Long sent a simple patch to make hard disks shown as pass-through devices available to the "da" driver, hence becoming real hard disks. It's just a matter of deleting all the logical volumes before using the disks. I've been running this on a machine with MFI since 2007 and so far so good. The machine is now on 8.1 and I hope to update to 9 soon.
The freebsd-scsi thread: http://lists.freebsd.org/pipermail/freebsd-scsi/2007-October/003224.html
The behavior with my torture tests was good. One of the things I use to do when testing a configuration is to remove a disk suddenly with the system working. That was a pain in the ass with the mfi thingy, really straightforward with the disks accessed in pass through mode.
Now I am installing a Sun X4240 server, and, surprisingly, I've stumbled upon a similar problem. Now it's an "aac" card:
aac0: <SG-XPCIESAS-R-IN> mem 0xdfe00000-0xdfffffff irq 17 at device 0.0 on pci4
aac0: Enabling 64-bit address support
aac0: Enable Raw I/O
aac0: Enable 64-bit array
aac0: New comm. interface enabled
aac0: Sun STK RAID INT, aac driver 2.1.9-1
aacp0: <SCSI Passthrough Bus> on aac0
aacp1: <SCSI Passthrough Bus> on aac0
aacp2: <SCSI Passthrough Bus> on aac0
This is a disk on /var/run/dmesg.boot,
da0: <SEAGATE ST914603SSUN146G 0868> Fixed Direct Access SCSI-5 device
da0: 0KB/s transfers
da0: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C)
and this is what I see from camcontrol:
# camcontrol devlist
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 8 lun 0 (da0,pass0)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 9 lun 0 (da1,pass1)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 10 lun 0 (da2,pass2)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 11 lun 0 (da3,pass3)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 12 lun 0 (da4,pass4)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 13 lun 0 (da5,pass5)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 14 lun 0 (da6,pass6)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 15 lun 0 (da7,pass7)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 16 lun 0 (da8,pass8)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 17 lun 0 (da9,pass9)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 18 lun 0 (da10,pass10)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 19 lun 0 (da11,pass11)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 20 lun 0 (da12,pass12)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 21 lun 0 (da13,pass13)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 22 lun 0 (da14,pass14)
<SEAGATE ST914603SSUN146G 0868> at scbus6 target 23 lun 0 (da15,pass15)
<LSILOGIC SASX28 A.0 5021> at scbus8 target 0 lun 0 (ses0,pass16)
<ADAPTEC Virtual SGPIO 0 0001> at scbus8 target 1 lun 0 (ses1,pass17)
<ADAPTEC Virtual SGPIO 1 0001> at scbus8 target 2 lun 0 (ses2,pass18)
<TSSTcorp CD/DVDW TS-T632A SR03> at scbus15 target 0 lun 0 (cd0,pass19)
<SanDisk Cruzer Blade 1.20> at scbus16 target 0 lun 0 (da16,pass20)
camcontrol inq 6:8:0
pass0: <SEAGATE ST914603SSUN146G 0868> Fixed Direct Access SCSI-5 device
pass0: Serial Number 000946821D70 3SD21D70
pass0: 3.300MB/s transfers
The transfer speed seems to be silly, but Bonnie++ on a 16 disk raidz2 gave 200+MBps block writing, 700+MBps block reading, so it seems to be working.
So far there's just one side effect of accessing the disks in pass through mode: I cannot reboot the machine, seems to hang after flushing the buffers. It happens both with the mfi and the aac drivers.
Just wondering, should we have, maybe, a tunable allowing aac and mfi to bypass the RAID firmware thingy? Is there any kind of exhaustive test we could perform to make sure that the card isn't doing weird things.
I've noticed, in the case of the aac machine I'm testing, that camcontrol tags shows just one "device opening". I'm wondering if it would be safe to increase them? Right now the machine isn't in production, so I can perform some tests.
Best regards,
Borja.
More information about the freebsd-scsi
mailing list