RAID monitoring tools
Anders Nordby
anders at FreeBSD.org
Sat Nov 18 18:41:31 UTC 2006
Hi,
On Sun, Oct 29, 2006 at 03:39:26PM +1100, Edwin Groothuis wrote:
> Last week we had two failing disks, and if it wasn't for a walk
> through the datacenter (which is off-site, and ten dollars away)
> we wouldn't have noticed it. I've read the thread about hpacucli,
> and have had my failed attempts to get it up and running under the
> linuxolator.
>
> So the question is: how do *you* monitor the status of your disks
> and RAID arrays? Any suggestions will be appriciated.
Apart from using camcontrol, you can do log monitoring to catch events
from the ciss driver. On a server that had a failing disk recently, I
got this in the messages log:
Nov 14 03:17:44 aicache7 kernel: ciss0: *** SCSI bus speed downshifted,
SCSI port 2
Nov 14 03:17:48 aicache7 kernel: ciss0: *** Physical drive failure: SCSI
port 2 ID 1
Nov 14 03:17:48 aicache7 kernel: ciss0: *** State change, logical drive
0
Nov 14 03:17:48 aicache7 kernel: ciss0: logical drive 0 (pass0) changed
status OK->interim recovery, spare status 0x0
Attached is also a Nagios plugin to check the status of a Compaq RAID
using camcontrol.
Cheers,
--
Anders.
-------------- next part --------------
#! /usr/bin/perl
# anders at aftenposten.no, 2006-08-22
# check status of COMPAQ RAID volumes in FreeBSD
%modelist=();
$okstatus="OK";
$arraytxt="COMPAQ RAID";
$ENV{PATH} = "/usr/local/bin:/usr/local/sbin:$ENV{PATH}:/sbin";
$volumes = 0;
if (!open(CAM, "sudo -u root camcontrol devlist |")) {
print "ERROR, could not open sudo -u root /sbin/camcontrol.\n";
exit(3);
}
while(<CAM>) {
next if ($_ !~ /$arraytxt/);
$volumes++;
$mode = $_;
chomp($mode);
$mode =~ s@<COMPAQ RAID \d+\s+VOLUME @@;
$mode =~ s@>.*@@;
# print "Mode: $mode\n";
if (defined $modelist{"$mode"}) {
$modelist{"$mode"}++;
} else {
$modelist{"$mode"}=1;
}
}
close(CAM);
if ($volumes == 0) {
print "No $arraytxt arrays found. Sudo problem?\n";
exit(3);
} elsif ($volumes == $modelist{"$okstatus"}) {
# All volumes are OK
print $modelist{"$okstatus"} . " of " . $volumes . " volumes OK\n";
exit(0);
} else {
# Not all volumes are OK
print "ERROR, $volumes volumes:";
foreach $key (keys %modelist) {
print " " . $modelist{"$key"} . " $key"
}
print "\n";
# This is critical
exit(2);
}
More information about the freebsd-proliant
mailing list