Overlapped Commands error
Andrew Boyer
aboyer at averesystems.com
Wed Jun 16 16:33:13 UTC 2010
Hello SCSI experts,
We recently saw this SCSI command error:
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): READ(10). CDB: 28 0 2 c8 7f a0 0 0 20 0
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): CAM Status: SCSI Status Error
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): SCSI Status: Check Condition
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): ABORTED COMMAND asc:4e,0
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Overlapped commands attempted field replaceable unit: 1
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Retrying Command (per Sense Data)
> Jun 15 15:08:37 eval12 kernel: mpt0: request 0xffffffff815d5c20:40101 timed out for ccb 0xffffff000d54d800 (req->ccb 0xffffff000d54d800)
> Jun 15 15:08:37 eval12 kernel: mpt0: attempting to abort req 0xffffffff815d5c20:40101 function 0
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_wait_req(1) timed out
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_recover_commands: abort timed-out. Resetting controller
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0
> Jun 15 15:08:38 eval12 kernel: mpt0: completing timedout/aborted req 0xffffffff815d5c20:40101
> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16
> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x12
> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16
No one here has ever seen this before. We're using a CAM and MPT stack from August 2009 with an LSI1068e HBA connected to Seagate SAS HDDs.
This is what the SCSI Architecture Manual (SAM-5 draft) has to say about overlapped commands:
> 5.10 Overlapped commands
> An overlapped command occurs when a task manager or a task router detects the use of a duplicate I_T_L_Q nexus (see 4.6.6) in a command before that I_T_L_Q nexus completes its command lifetime (see 5.5). Each SCSI transport protocol standard shall specify whether or not a task manager or a task router is required to detect overlapped commands.
> A task manager or a task router that detects an overlapped command shall abort all commands received on the I_T nexus on which the overlapped command was received and the device server shall return a CHECK CONDITION status for the overlapped command. The sense key shall be set to ABORTED COMMAND and the additional sense code shall be set to OVERLAPPED COMMANDS ATTEMPTED.
> NOTE 11 - An overlapped command may be indicative of a serious error and, if not detected, may result in corrupted data. This is considered a catastrophic failure on the part of the SCSI initiator device. Therefore, vendor specific error recovery procedures may be required to guarantee the data integrity on the medium. The SCSI target device logical unit may return additional sense data to aid in this error recovery procedure (e.g., sequential-access devices may terminate the overlapped command with the residue of blocks remaining to be written or read at the time the second command was received).
> 4.8.2 Command identifier
> A command identifier (i.e., the Q in an I_T_L_Q nexus) is assigned by a SCSI initiator device to uniquely identify one command in the context of a particular I_T_L nexus, allowing more than one command to be outstanding for that I_T_L nexus at the same time. Each SCSI transport protocol defines the size of the command identifier, up to a maximum of 64 bytes, to be used by SCSI ports that support that SCSI transport protocol.
> SCSI transport protocols may define additional restrictions on command identifier assignments (e.g., requiring command identifiers to be unique per I_T nexus or per I_T_L nexus, or sharing command identifier values with other uses such as task management functions).
Can anyone point me to where in the stack the command identifier is assigned? I see where MPT assigns tags in target mode, but it's the initiator in this case. Any advice?
Also, is CAM doing the right thing by retrying? scsi_error_action() in cam/scsi/scsi_all.c always sets the retry bit on aborted commands, even though the spec quoted above makes it sound like this should be a fatal error ("This is considered a catastrophic failure on the part of the SCSI initiator device"). Should scsi_error_action() be looking at the Additional Sense Code?
Thanks,
Andrew
--------------------------------------------------
Andrew Boyer aboyer at averesystems.com
More information about the freebsd-scsi
mailing list