LSI - MR-Fusion controller driver <mrsas> patch and man page

Borja Marcos borjam at sarenet.es
Tue Mar 25 14:31:30 UTC 2014


On Mar 25, 2014, at 12:42 PM, Desai, Kashyap wrote:

> Borja:
> 
> <mrsas> driver will attach Raid volume and JBOD (SysPD) to the CAM layer.  It is not good to expose hidden raid volume or what we called as pass-through device here to the OS for many reason..  Other than management things like SMART monitor, we cannot/should not do file system IO on pass-through devices.  

Of course it's not a good idea to expose drives that are part of a logical volume. But unconfigured drives should. Read on, please ;)

> With <mfi> it might be true that user always do file system IO on <mfiX> deivce and consider /dev/daX as pass-through device... With <mrsas> all device will be seen as <daX>. You cannot identify which will be a pass-through and which is configured device by LSI config utils.

Exposing devices as "da" should not be a mere "esthetic" decision. The "da" driver has some stuff intended for direct access to disks, but not for logical volumes created by other devices such as advanced RAID cards. For example, the "da" device can issue TRIM commands, it reads device serial numbers (which, now, can be used by GEOM to identify disks), etc. Disks are more complicated now with that "advanced format" thing and so I think it's very important for disks to be directly accessible if you want/need it.  Of course other features might be introduced in the future. Features that may be added to the "da" driver but which will probably be useless for a logical device, even outright inappropiate.

I would suggest you to offer choice, and, most critically, to offer a _clear_ _choice_, as you have different kinds of customers. Some will want/need logical volumes and advanced RAID stuff, others won't. In some machines I have I am actually doing *both* things at the same time. I may have a RAID card based mirror for certain tasks, maybe with a UFS filesystem on it, relying on pass-through to the rest of the devices on which I use ZFS.

I think you should use a specific name for the logical devices, such as the mfi driver does. If I see a "mfid" device name it's clear that it's a logical device, not a "bare metal" hard disk, and that its behavior and features depend mainly on the logical device magic in the card. 

And you should offer a perfectly transparent pass-through option, maybe restricted to disks not configured as "RAID" ones (to avoid accidents), I mean, what you now call "syspd" mode. These disks, ideally, should not be assigned to a special logic-volume like "mfisyspd" driver (or its equivalent), but to the "da" driver so that all of the features I expect from a bare metal hard disk would work. SMART, access to mode pages, detecting sector sizes, serial numbers, whatever, would work without hiccups.

Doing it the current "syspd" way means that any new feature added to disks must be added to the card firmware and to the "syspd" portion of the driver, while keeping a clear access to the SAS (or SATA-on-SAS) devices with no other manipulation would mean that the "da" driver would have immediate access to those features with no need to add support to the card firmware and driver.


> It is not a complex code change if pass-through device is required for <mrsas>, but it is just a matter of no use and more error prone to expose devices as pass-through. 

It is certainly error prone if you are using logical devices. But if you are not using them (my case and there are many others in this situation) the lack of a well supported pass through device can  be error prone.

From a mere engineering point of view, it's a bad idea to add unnecessary software layers. Advanced RAID card features are a lifesaver for "classic" filesystems such as UFS/FFS, EXTwhateverFS, NTFS, etc, but can get in the way of other filesystems such as ZFS. ZFS intends to perform the functions of a RAID device itself. 

> None of the LSI driver does this including <mps> and <mrsas> in FreeBSD + <megaraid_sas> and <mpt2sas>/<mpt3sas> in Linux.

I've been using pass-through disks on Adaptec RAID cards (aac), and LSI Logic (mps and mfi) with different levels of success for years. It can be tricky, but ZFS works best with direct access to the disks.

> If you can express what functionality you think it is missing, if there is not pass-through device ?

Of course. Some of the missing functionalities I would miss by not using a pass through are:

- Inability to support problematic disks with "quirks". The "da" driver offers a flexible mechanism for that. If not using the da driver I lose that ability, and you will agree with me that getting a manufacturer (LSI) to update a cards firmware is much harder than doing it myself if needed. 

- Inability to support future/special features without a firmware update for the card. An example is the diversity of block sizes in SSDs, or, more recently, TRIM for SSDs. ZFS on FreeBSD now supports TRIM, and it's important for performance and drive health. How does "syspd" handle it currently? 
 
- Again I will insist on how additional software layers are a bad idea. 

- Also, one of the "features" of LSI cards represents a serious operational issue: the persistent assignment of target numbers to disk serial numbers keeping a table of target-serial number mappings on NVRAM. There were some recent messages in this list regarding that problem. And it seems to happen even when using pass-through devices. 

In the past I have had problems with ZFS and the "old" way of creating "pseudo JBOD" devices on LSI cards by creating a RAID 0 logical volume for each disk. For example, hot swapping a broken disk can be more error prone if,  apart from just extracting a disk and adding a new one, I need to run certain tool to have it effectively recognised by the card firmware. It adds unnecessary complexity. Moreover, in some cases (I can't recall the exact details, as it happened several years ago) it requires a reboot, which defeats the purpose of how swappable disks in the first place.

Please don't underestimate the operational impact of all this. An operator swapping a disk at 3 am should not need to do any complex check to determine the disk to extract. Nor he/she should require additional actions such as "mfiutil online this", activate that or, of course, a reboot, to have it recognised. ZFS (and, I presume, other advanced filesystems) has its own commands for that, which include their own sanity checks doing its best to avoid trouble. 

> Are you doing ZFS (File system IO) on Pass-through device. ? 

Indeed I am. And I know there are many successful setups doing the same.

> If yes, then why can't you create JBOD/SysPD  for that purpose?

It's explained above but I will summarize.

- Plain simple good engineering practice (avoiding unneeded software layers),
- Access to special/future features on disks
- Better observability (monitoring, etc)
- Simpler operational procedures which means safer systems operations and better reliability.

Let me be brutally honest here and, please, take no offense but take it as feedback from a customer. Right now,
advanced RAID cards can be more a liability than a desirable feature. Look at all the places where people repurpose
RAID cards to be simple HBAs doing all sorts of unsupported voodoo. 

Ideally this shouldn't happen, but we are somewhat forced by server manufacturers. At some point at
least, for example, Dell refused to sell "IT mode" LSI2008 cards for internal devices, selling them just with external SAS 
connectors. So many people just repurpose the internal, "IR firmware" cards to "IT mode" so that they can be simple HBAs even 
though they still pose a problem with that target-serial number feature in NVRAM. I have an IBM server here with an onboard Invader card
which, obviously, has many more features.

By defining some design guidelines for your hardware, firmware, and drives, however, you can get to a win-win solution. If a
card can fullfill both roles perfectly (advanced RAID features and plain HBA) it will no longer be a liability. The same hardware
will be appropiate for many purposes, and it will be even better for the purchasing departments of us, your final customers. No need
to be keeping track of several SKUs  depending on the intended purpose. Same card usable for, say, NTFS and ZFS depending only
on configuration.

And those design guidelines I am suggesting are simple:

- Full functioning pass through mode with a minimal surprise component, with the simplest, most transparent possible access from the CAM
layer to the SAS/SATA commands so that those true pass-through devices get assigned to the right drivers such as "ses",  "da", "sa", etc. This should be a core feature, not an add on to somewhat ease monitoring.

- Making that transparent, pass through mode clearly distinguishable from the logical volume magic, so that the device name reflects
its nature and purpose. "mfid" (or "mrsasd", or whatever you like) would the logical devices, avoiding attaching them to the standard CAM drivers. 


You could just repurpose the "syspd" configuration in the newer cards/firmware versions so that drives marked as "syspd" become perfectly transparent pass throughs.

Please consider it, I am sure you will have many happy customers.  

(And I hope you endured reading this message until the end!!)


Thank you!







Borja.



More information about the freebsd-scsi mailing list