mps0-troubles
Kenneth D. Merry
ken at freebsd.org
Sat Jun 25 03:07:50 UTC 2011
On Sat, Jun 25, 2011 at 03:30:37 +0200, Joachim Tingvold wrote:
> On Fri, Feb 25, 2011, at 19:33:51PM GMT+01:00, Kenneth D. Merry wrote:
> >I just checked the change into -current, I'll merge it to -stable
> >next week.
>
> I'm back! Missed me? :-D
>
> After running fine for a while, I decided to do some more testing.
> Usual 'dd' in a while-loop over the night, and woke up to this;
>
> ###
> mps0: (0:39:0) terminated ioc 804b scsi 0 state c xfer 65536
> mps0: (0:39:0) terminated ioc 804b scsi 0 state c xfer 65536
> mps0: (0:39:0) terminated ioc 804b scsi 0 state c xfer 65536
> mps0: (0:39:0) terminated ioc 804b scsi 0 state c xfer 65536
> mps0: (0:39:0) terminated ioc 804b scsi 0 state c xfer 0
> mps0: (0:39:0) terminated ioc 804b scsi 0 state 0 xfer 0
> mps0: (0:39:0) terminated ioc 804b scsi 0 state 0 xfer 0
> mps0: (0:39:0) terminated ioc 804b scsi 0 state 0 xfer 0
> mps0: (0:39:0) terminated ioc 804b scsi 0 state 0 xfer 0
> mps0: mpssas_remove_complete on target 0x0027, IOCStatus= 0x0
> (da7:mps0:0:39:0): lost device
> (da7:mps0:0:39:0): Invalidating pack
> (da7:mps0:0:39:0): Invalidating pack
> (da7:mps0:0:39:0): Invalidating pack
> (da7:mps0:0:39:0): Invalidating pack
> (da7:mps0:0:39:0): Synchronize cache failed, status == 0xa, scsi
> status == 0x0
> (da7:mps0:0:39:0): removing device entry
> da7 at mps0 bus 0 scbus0 target 39 lun 0
> da7: <ATA WDC WD10EACS-00Z 1B01> Fixed Direct Access SCSI-5 device
> da7: 300.000MB/s transfers
> da7: Command Queueing enabled
> da7: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
> ###
>
> Now, the disk was present at the time I checked, as camcontrol confirms;
>
> [root at filserver /storage/tmp]# camcontrol devlist|grep da7
> <ATA WDC WD10EACS-00Z 1B01> at scbus0 target 39 lun 0 (pass8,da7)
Yep, this looks like what I've seen with mps controllers talking to SATA
drives through an expander under high load.
I know I've asked this before, but what brand of expander do you have, and
is it 3Gb or 6Gb? It looks like the drive is probing at 3Gb in any case.
It looks like the drive went away and came back.
> However, the disk was marked as "REMOVED" by 'zpool status';
>
> ###
> [jocke at filserver /storage/tmp]$ zpool status
> pool: storage
> state: DEGRADED
>
> NAME STATE READ WRITE CKSUM
> storage DEGRADED 0 0 0
> raidz2-0 ONLINE 0 0 0
> da8 ONLINE 0 0 0
> da9 ONLINE 0 0 0
> da10 ONLINE 0 0 0
> da11 ONLINE 0 0 0
> da15 ONLINE 0 0 0
> da16 ONLINE 0 0 0
> raidz2-1 DEGRADED 0 0 0
> da0 ONLINE 0 0 0
> da1 ONLINE 0 0 0
> da2 ONLINE 0 0 0
> da3 ONLINE 0 0 0
> da4 ONLINE 0 0 0
> da5 ONLINE 0 0 0
> da6 ONLINE 0 0 0
> da7 REMOVED 0 0 0
> da12 ONLINE 0 0 0
> da13 ONLINE 0 0 0
> spares
> da14 AVAIL
> ###
>
> A quick 'zpool online storage da7' works fine, as suspected, and pool
> is resilvering at the moment.
>
> I find it a bit worrisome that a disk was removed like that. It
> _could_ be that the disk isn't completely good, however, due to my
> previous experiences with mps, I suspect the disk is fine (smartctl-
> readouts on the disk seems to be good as well).
The disk is probably fine. That error tends to happen when you have a lot
of contention under high load. I wish I knew why. It is something that
LSI should fix, I was talking to them for a while trying to get an answer
on it, but got nowhere.
With some of the ZFS improvements that Justin is working on in -current,
I think the drive would have probably been automatically put back into
the pool when it came back.
Ken
--
Kenneth Merry
ken at FreeBSD.ORG
More information about the freebsd-scsi
mailing list