MRSAS driver/LSI MegaRaid 92XX-93XX admin question: When one of the Raid's physical drives break, how is it reported in the logs?
Doug Ambrisko
ambrisko at ambrisko.com
Thu Feb 25 19:45:55 UTC 2016
On Tue, Feb 16, 2016 at 04:00:02PM -0800, Doug Ambrisko wrote:
| On Sun, Feb 14, 2016 at 10:13:31PM +0700, Tinker wrote:
| | (Will send any followup from now only to freebsd-scsi@ .)
| |
| | Did some additional research and found that the disk failure indeed is
| | reported in MRSAS' "event log".
| |
| | So my final question then is, how do you extract it into userland (in
| | the absence of an "mfiutil" as the MFI driver has)?
|
| I have local changes to print the event log in dmesg which gets sysloged.
| We then watch syslog for issues to report things to our customers
| automatically. This is similar to mfi(4).
I put up a couple of patches:
https://people.freebsd.org/~ambrisko/mrsas.patch
https://people.freebsd.org/~ambrisko/mrsasutil.patch
I made a bunch of changes to the driver to deal with issues we've seen
at work. I've done light testing and it is working better now. Most
of my testing is under FreeBSD 9.2 but the code base is from -current.
It is going through more product testing that exposed issues with the
ioctl path. One of the major changes in the ioctl path is let the
OS create the SG list since user-land doesn't really know what the
kernel memory is like and lets the OS figure it out. It also uses
64 bit address range. Limiting the driver address range was creating
problems when the system memory was being used and potentially fragmented
resulting in lack of memory that could be allocated. This occurred
after our appliance was up for a while and during tests.
It also adds support for displaying event logs to dmesg such as:
mrsas0: 19366 (509744360s/0x0002/info) - State change on PD 16(e0x00/s1) from ONLINE(18) to FAILED(11)
mrsas0: 19367 (509744360s/0x0001/info) - State change on VD 00/0 from OPTIMAL(3) to DEGRADED(2)
mrsas0: 19368 (509744360s/0x0001/CRIT) - VD 00/0 is now DEGRADED
mrsas0: 19369 (509744371s/0x0002/info) - Rebuild started on PD 16(e0x00/s1)
mrsas0: 19370 (509744371s/0x0002/info) - State change on PD 16(e0x00/s1) from FAILED(11) to REBUILD(14)
It only happens at run time not at boot like mfi.
I also added support for mfiutil and created a patch against mfiutil
to create hard link to mrsasutil so it will know to automatically
use mrsas0. I created the above logs via mrsasutil fail <disk>
mrsasutil rebuild <disk>
Again this is lightly tested. I need to test 32 bit emulation and
32 bit build. I need to test it with current. It's a work in progress.
Thanks,
Doug A.
More information about the freebsd-scsi
mailing list