Tape drive needs a BUS_DEVICE_RESET but isn't getting one
Doug Ledford
dledford at redhat.com
Sun Dec 13 16:21:58 PST 1998
Randy Gobbel wrote:
>
> I'm now joining the ranks of the people having trouble with tape drives. I
> downloaded the shareware version of Arkeia. Looks very nice, but it causes my
> tape drive to hang so hard that only a reboot brings it back to life. The
> application appears to be giving some sort of Write command, which times out
> (after a very long time). This triggers an Abort, but that's not enough--the
> device is still hosed. I think a BUS_DEVICE_RESET would get it unstuck, but
> the driver isn't trying that, and I haven't found any obvious way to force this
> to happen. Here are the error messages I'm getting from the timeout:
>
> Dec 13 03:04:54 gigan kernel: scsi : aborting command due to timeout : pid
> 221414, scsi0, channel 0, id 4, lun 0 Write (6) 01 00 00 40 00
> Dec 13 03:04:54 gigan kernel: (scsi0:0:4:0) Aborting scb 24, flags 0x4
> Dec 13 03:04:54 gigan kernel: (scsi0:0:4:0) SCB disconnected. Queueing Abort
> SCB.
> Dec 13 03:04:55 gigan kernel: st0: Error 26030000 (sugg. bt 0x20, driver bt
^^^^^^^^
> 0x26, host bt 0x3).
That error indicates that we attempted to reach the device with a queued
ABORT message, and it never connected to the device because of a
SELECTION TIMEOUT. If we are getting SELTO on the queued abort command,
then a queued BDR would do the same thing. If we can't get the device
to respond to the arbitration/selection phases then we can't do anything
with it. The only thing there that *might* work is a full bus reset.
The bug in the driver so to speak is that when the queued abort went
through a SELTO, I should have picked up that this was a queued abort
and not the original command, in which case I should have just dropped
the command instead of completing it back to the mid level SCSI code (I
would still need to do some cleanup after the queued abort, but I
shouldn't let it count as the original command and get sent back to the
mid level SCSI code) so that it would time out again and result in a bus
reset.
> Of course I also need to figure out what the application is doing that causes
> the initial hangup, but there certainly should be some way to recover without
> rebooting.
>
> Is there some way to force a BUS_DEVICE_RESET that I don't know about? Any
> suggestions appreciated.
I'll fix that for a 5.1.7 driver. FWIW, when a queued abort command
can't make it to the device, the aic7xx driver already recognizes that
fact and when the mid level SCSI code calls into the reset function for
a BDR we escalate the action to a full bus reset, I just missed the case
of SELTO since this usually happens when the bus is completely wedged,
not when it's operable.
--
Doug Ledford <dledford at redhat.com>
Opinions expressed are my own, but
they should be everybody's.
To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message
More information about the aic7xxx
mailing list