ZFS and 3ware controller resets
Adam Nowacki
nowakpl at platinum.linux.pl
Sun Sep 25 15:51:34 UTC 2011
I have a 20 disk storage system, every now and then a disk dies and
causes 3ware controller to reset because of disk timeouts. This cuts out
ZFS from all disks, even healthy ones and the system requires a hard reset.
Two issues here:
1) Why the controller has to reset? Thats a completely insane way of
dealing with drive timeout.
2) ZFS not reopening the disk after controller reset.
FreeBSD version: 8.1-RELEASE-p1
/c0 Driver Version = 3.80.06.003
/c0 Model = 9650SE-16ML
/c0 Available Memory = 224MB
/c0 Firmware Version = FE9X 4.10.00.007
/c0 Bios Version = BE9X 4.08.00.002
/c0 Boot Loader Version = BL9X 3.08.00.001
pool: zp2
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zp2 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
da1p1 ONLINE 0 0 0
da2p1 ONLINE 0 0 0
da3p1 ONLINE 0 0 0
da4p1 ONLINE 0 0 0
da5p1 ONLINE 0 0 0
da6p1 ONLINE 0 0 0
da7p1 ONLINE 0 0 0
da9p1 ONLINE 0 0 0
da8p1 ONLINE 0 0 0
da10p1 ONLINE 0 0 0
Then when disk starts behaving:
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a3 f4 e7 60 0 0 8 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 7c 43 b8 0 0 10 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 ce e5 ca 30 0 0 20 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da3:twa0:0:3:0): READ(10). CDB: 28 0 a4 2d 2d f8 0 0 8 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2
(da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 91 7c f8 0 0 20 0
(da3:twa0:0:3:0): CAM status: SCSI Status Error
(da3:twa0:0:3:0): SCSI status: Check Condition
(da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
twa0: Request 72 timed out!
twa0: INFO: (0x16: 0x1108): Resetting controller...:
twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0
twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=3
twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1
twa0: [ITHREAD]
(da1:twa0:0:1:0): lost device
(da2:twa0:0:2:0): lost device
(da3:twa0:0:3:0): lost device
(da4:twa0:0:4:0): lost device
(da5:twa0:0:5:0): lost device
(da6:twa0:0:6:0): lost device
(da7:twa0:0:7:0): lost device
(da8:twa0:0:8:0): lost device
(da9:twa0:0:9:0): lost device
(da10:twa0:0:10:0): lost device
(da11:twa0:0:11:0): lost device
(da12:twa0:0:12:0): lost device
(da13:twa0:0:13:0): lost device
(da1:twa0:0:1:0): removing device entry
da1 at twa0 bus 0 scbus0 target 1 lun 0
da1: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da1: 100.000MB/s transfers
da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da2:twa0:0:2:0): removing device entry
da2 at twa0 bus 0 scbus0 target 2 lun 0
da2: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da2: 100.000MB/s transfers
da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da3:twa0:0:3:0): removing device entry
da3 at twa0 bus 0 scbus0 target 3 lun 0
da3: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da3: 100.000MB/s transfers
da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da4:twa0:0:4:0): removing device entry
da4 at twa0 bus 0 scbus0 target 4 lun 0
da4: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da4: 100.000MB/s transfers
da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da5:twa0:0:5:0): removing device entry
da5 at twa0 bus 0 scbus0 target 5 lun 0
da5: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da5: 100.000MB/s transfers
da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da6:twa0:0:6:0): removing device entry
da6 at twa0 bus 0 scbus0 target 6 lun 0
da6: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da6: 100.000MB/s transfers
da6: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da7:twa0:0:7:0): removing device entry
da7 at twa0 bus 0 scbus0 target 7 lun 0
da7: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da7: 100.000MB/s transfers
da7: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da8:twa0:0:8:0): removing device entry
da8 at twa0 bus 0 scbus0 target 8 lun 0
da8: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da8: 100.000MB/s transfers
da8: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da9:twa0:0:9:0): removing device entry
da9 at twa0 bus 0 scbus0 target 9 lun 0
da9: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da9: 100.000MB/s transfers
da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da10:twa0:0:10:0): removing device entry
da10 at twa0 bus 0 scbus0 target 10 lun 0
da10: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da10: 100.000MB/s transfers
da10: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da11:twa0:0:11:0): removing device entry
da11 at twa0 bus 0 scbus0 target 11 lun 0
da11: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da11: 100.000MB/s transfers
da11: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da12:twa0:0:12:0): removing device entry
da12 at twa0 bus 0 scbus0 target 12 lun 0
da12: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da12: 100.000MB/s transfers
da12: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
(da13:twa0:0:13:0): removing device entry
da13 at twa0 bus 0 scbus0 target 13 lun 0
da13: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device
da13: 100.000MB/s transfers
da13: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
pool: zp2
state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool
clear'.
see: http://www.sun.com/msg/ZFS-8000-HC
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zp2 ONLINE 7 11 0
raidz2 ONLINE 16 32 0
da1p1 ONLINE 4 10 0
da2p1 ONLINE 4 10 0
da3p1 ONLINE 5 642 1
da4p1 ONLINE 3 8 0
da5p1 ONLINE 3 12 0
da6p1 ONLINE 3 12 0
da7p1 ONLINE 3 12 0
da9p1 ONLINE 3 12 0
da8p1 ONLINE 3 14 0
da10p1 ONLINE 3 10 0
errors: 10 data errors, use '-v' for a list
More information about the freebsd-fs
mailing list