aac(4) handling of probe when no devices are there
Alexander Sack
pisymbol at gmail.com
Wed Dec 16 17:11:01 UTC 2009
On Tue, Dec 15, 2009 at 4:54 AM, Scott Long <scottl at samsco.org> wrote:
> On Dec 14, 2009, at 2:47 PM, Alexander Sack wrote:
>>
>> Hello Again:
>>
>> I guess I have a technical question/concern that I was looking for
>> feedback. During the probe sequence, aac(4) conditionally responds
>> to INQUIRY commands depending on target LUN:
>>
>> aac_cam.c/aac_cam_complete():
>> 532 if (command == INQUIRY) {
>> 533 if (ccb->ccb_h.status == CAM_REQ_CMP)
>> {
>> 534 device = ccb->csio.data_ptr[0] & 0x1f;
>> 535 /*
>> 536 * We want DASD and PROC devices to
>> only be
>> 537 * visible through the pass device.
>> 538 */
>> 539 if ((device == T_DIRECT) ||
>> 540 (device == T_PROCESSOR) ||
>> 541 (sc->flags &
>> AAC_FLAGS_CAM_PASSONLY))
>> 542 ccb->csio.data_ptr[0] =
>> 543 ((device & 0xe0) |
>> T_NODEVICE);
>> 544 } else if (ccb->ccb_h.status ==
>> CAM_SEL_TIMEOUT &&
>> 545 ccb->ccb_h.target_lun != 0) {
>> 546 /* fix for INQUIRYs on Lun>0
>> */
>> 547 ccb->ccb_h.status =
>> CAM_DEV_NOT_THERE;
>> 548 }
>> 549 }
>>
>> Why is CAM_DEV_NOT_THERE skipped on LUN 0?
>
> In the parallel scsi world, a selection timeout means that all LUNs within
> the entire target do not (or no longer) exist. So returning
> CAM_SEL_TIMEOUT for LUN 1 would tell CAM to invalidate LUN 0 as well.
>
> If you look higher up in this function, you'll see a note about the
> error/status codes from the AAC firmware coincidentally matching CAM's
> status codes. My guess is that somewhere along the line, someone at Adaptec
> stopped reading the SCSI spec and starting returning CAM_SEL_TIMEOUT for
> LUNs greater than 0, which is why this work-around is now in the driver.
Interesting. Learn something everyday. I did not know that a
selection timeout on a non-zero LUN meant no other LUN was available.
As a colleague noted, "Has Adaptec ever read the SCSI spec?" Just
kidding (somewhat)....
>> This is true on my target
>> 6.1-amd64 machine as well as CURRENT. The reason why I ask this is
>> because now that aac(4) is sequential scanned, there are a lot of cam
>> interrupts that come in on my 6.x machine where the threshold is only
>> 500 and I get the interrupt storm threshold warning for swi2 pretty
>> quickly:
>>
>> Interrupt storm detected on "swi2:"; throttling interrupt source
>>
>> Obviously its contingent on the number of adapters you have on your
>> system. On CURRENT I didn't see this because the threshold is double
>> (I think its a 1000 by default).
>>
>> The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during
>> the scan. The probe sequence in CURRENT as well as 6.1 handles
>> CAM_SEL_TIMEOUT a little differently depending on context.
Yeah I spoke too soon. I think that is a red herring though and
misinterpretation of what that was really doing (in this case just
seeing the device as unconfigured and moving on).
But I STILL don't understand why its treated as a AC_LOST_DEVICE event
at scan time (i.e. more overhead than really necessary but perhaps I
am not thinking of all the possibilities down this code path, i.e. why
create a path, then call xpt_asyc, all to just set the flag as
unconfigured - perhaps its more align with the model than anything
else and I'm reading too much into it).
> It's not at all clear to me what is going on here. Can you instrument the
> code to record the status of everything that is being issued to the aac_cam
> module?
Yes surely. I think what might be happening is that after the
INQUIRY fails, xpt_release_ccb() which I think will also check to see
if any more CCBs should be sent to the device and send them.
Basically the boot -v output is I am getting a CAM_SEL_TIMEOUT for
each target and just hit into the 500 interrupt storm default
threshold on 6.1.
Let me investigate further...I'm on the right track, but I need to
instrument more...Scott its my first time playing with CAM (be
gentle). :D
-aps
More information about the freebsd-scsi
mailing list