RFC: GEOM MULTIPATH rewrite
Alexander Motin
mav at FreeBSD.org
Tue Nov 1 19:24:10 UTC 2011
On 01.11.2011 19:50, Dennis Kögel wrote:
> Not sure if replying on-list or off-list makes more sense...
Replying on-list could share experience to other users.
> Anyway, some first impressions, on stable/9:
>
> The lab environment here is a EMC VNX / Clariion SAN, which has two Storage Processors, connected to different switches, connected to two isp(4)s on the test machine. So at any time, the machine sees four paths, but only two are available (depending on which SP owns the LUN).
>
> 580# camcontrol devlist
> <DGC VRAID 0531> at scbus0 target 0 lun 0 (da0,pass0)
> <DGC VRAID 0531> at scbus0 target 1 lun 0 (da1,pass1)
> <DGC VRAID 0531> at scbus1 target 0 lun 0 (da2,pass2)
> <DGC VRAID 0531> at scbus1 target 1 lun 0 (da3,pass3)
> <COMPAQ RAID 1(1VOLUME OK> at scbus2 target 0 lun 0 (da4,pass4)
> <COMPAQ RAID 0 VOLUME OK> at scbus2 target 1 lun 0 (da5,pass5)
> <hp DVD D DS8D3SH HHE7> at scbus4 target 0 lun 0 (cd0,pass6)
>
> I miss the ability to "add" disks to automatic mode multipaths, but I (just now) realized this only makes sense when gmultipath has some kind of path checking facility (like periodically trying to read sector 0 of each configured device, this is was Linux' devicemapper-multipathd does).
In automatic mode other paths supposed to be detected via metadata
reading. If in your case some paths are not readable, automatic mode
can't work as expected. By the way, could you describe how your
configuration supposed to work, like when other paths will start
working? Only when second storage processor itself detect that first one
is dead or it suppose to be controlled somehow? If booted with verbose
messages, what SCSI errors do you see on console/logs when trying to
access second storage processor?
Speaking about user-level daemons, it should be possible to do the same:
write rc.d script to create multipath device in manual mode on boot and
hook small script into devd.conf that will check models, serial numbers,
or whatever else of newly detected devices and manually add them. At
least if you are not booting from that device.
Another possible way I see is to make geom_multipath to analyze device
models and serial numbers on kernel level and attach devices based on it
without using metadata.
> First I did a test with active/active and manual mode (gmultipath create -A TESTBSD da{0,1,2,3}).
>
> This worked fine, and immediately kicked da{0,2} (which are not available) after writing something.
Predictable. Errors caused not working paths to be marked as failed. If
all working paths fail, these will be retried.
> It's quite unexpected that act/act is apparently based on the writer's process id or CPU affinity (guessing) -- I needed multiple parallel dd(1) jobs to get gmultipath to use more than one path, but then it worked fine. (This also means that a dysfunct path isn't recognized before some I/O is attempted). Performance was similar to an identical Linux setup (for a very simple sequential I/O test at least).
It is not an affinity, but just a dumb probe order. Now geom_multipath
only balances immediate load, not average. So if you have only one
request at a time, it will always use one path. If you have only one
initiator, it should not be important, but if there are several, it
should probably be improved.
> Using automatic mode (gmultipath label -A TESTBSD da{1,3,0,2}) silently ignores da{0,2}, which are not available. "list" does not show them at all. I guess it should at least throw an error.
Now "label" command only hints kernel to retaste other paths. If kernel
is unable to read metadata from specified devices or there is no
metadata, they just silently won't be added. I can add metadata checking
in user-level. It will not guarantee success, but should at least handle
basic mistakes.
> Do any of the layers below gmultipath currently have information about metadata like the volume's WWN? This could be helpful in status / list output, maybe for other things, I guess...
That information may be present inside CAM, but not above at the moment.
Only device model and serial number exported to GEOM.
> Another thought: As I guess "automatic" mode is meant to be the common case, the choice of "create" vs. "label" might be misleading. Users in a hurry will probably just look through the built-in help, see "create", and then use it. (I'm guilty as charged here, I didn't realize at first that I was using manual mode.)
It is a somewhat unified among several GEOM classes that label is for
automatic method and create is for manual.
Thank you for your feedback!
--
Alexander Motin
More information about the freebsd-geom
mailing list