gmirror refused to connect second disk after a reboot
Scott Lambert
lambert at lambertfam.org
Wed Jun 9 06:47:54 UTC 2010
On Sun, Jun 06, 2010 at 12:45:15PM -0700, Jeremy Chadwick wrote:
> On Sun, Jun 06, 2010 at 01:55:51PM -0500, Scott Lambert wrote:
> > I have one dual PIII machine doing the same to me. I've been assuming
> > my issue is with the ATA controller. ...
<snip>
> I agree -- these look like you have either a bad PATA cable, an PATA
> controller port which has gone bad, or a PATA controller which is
> behaving *very* badly (internal IC problems). ICRC errors indicate data
> transmission failures between the controller and the disk.
>
> Since these are classic PATA disks, ad0 is probably the master and ad2
> is the slave -- but both are probably on the same physical cable.
>
> The LBAs for both ad0 and ad2 are very close (ad0=242235039,
> ad2=242234911), which makes sense since they're in a mirror config. But
> two disks going kaput at the same time, around the same LBA? I have my
> doubts.
I think I actually made sure that ad0 and ad2 are on their own cables.
ad0 may be sharing with acd0 though.
Yeah, looks like it.
01:16:24 Wed Jun 09 $ sudo atacontrol list
ATA channel 0:
Master: ad0 <WDC WD2500JB-57REA0/20.00K20> ATA/ATAPI revision 7
Slave: acd0 <LG CD-ROM CRD-8521B/1.04> ATA/ATAPI revision 0
ATA channel 1:
Master: ad2 <WDC WD2500JB-57REA0/20.00K20> ATA/ATAPI revision 7
Slave: no device present
> SMART statistics for both of the disks themselves would help determine
> if the disks are seeing issues or if the disks are also seeing problems
> communicating with the PATA controller. (Depends on the age of the disks
> though; some older PATA disks don't have the SMART attribute that
> describes this).
The drives are only a couple of years old. The box itself is ancient.
:-) The ICRC error only seem to have occured right after boot.
I'll jerk the box apart to check/change the cabling when I get a chance.
Maybe I'll just dump the cd drive.
> What you should be worried about -- FreeBSD sees problems on both ad0
> and ad2. ad2 is offline cuz of the problem, but ad0 isn't. Chances are
> ad0 is going to fall off the bus eventually because of this problem. I
> really hope you do backups regularly (daily) if you plan on just
> ignoring this problem.
AMANDA takes care of things. Also, this box is not terribly important.
I rebuilt the array Sunday. I don't see anything terribly scary in the
smartctl output.
Anyway, I do hope I haven't hijacked the thread for the OP. I actually
just wanted to offer a possible matching datapoint.
--
Scott Lambert KC5MLE Unix SysAdmin
lambert at lambertfam.org
More information about the freebsd-stable
mailing list