cvs commit: src/sys/cam/scsi scsi_da.c

Fri Feb 2 15:18:16 UTC 2007

Nate Lawson wrote:
> Scott Long wrote:
>> mjacob at freebsd.org wrote:
>>>>
>>>> umass should probably just disable the SYNC_CACHE commands from CAM,
>>>> as well as whatever other commands are always quirked.  The firewire 
>>>> SIM
>>>> should probably do the same.
>>>>
>>>
>>> Err, probably should be XPORT based?
>>
>> Ah, very true.  Taking that a step further, there should probably be a 
>> broader concept of RBC and/or MMC as opposed to the assumption that 
>> everything is SBC.
>>
>> Scott
> 
> I have some experience with that (see the NO_6_BYTE sim option I added 
> for usb and firewire).  Of course, that was a hack and should be a XPORT 
> setting as you point out.
> 
> However, I don't think the umass situation is the same.  That's why I 
> haven't acted on it yet.  The issue is that SYNC_CACHE is a perfectly 
> valid RBC command.  Some devices support it and it works (50% of flash 
> drives my guess), some reject it but continue processing commands (25% 
> maybe), and some hang after receiving it (10-25%).  Obviously, the type 
> of device determines whether it's more likely to support this or not 
> (usb hard drive, almost certainly; mp3 player, probably not).
> 
> For the devices that hang, I have a strong suspicion that their firmware 
> state machine looks like this:
>     case SYNC_CACHE:
>          OptionallyWriteData();
>          while (1);  // wait for detach
> 
> Florent Thoumie (flz@) started some work based on some evidence that 
> Linux checks a "write cache present" bit in the INQUIRY data and decides 
> whether or not to run SYNC_CACHE based on that.  It's unknown yet how 
> closely this bit correlates with the hanging behavior though.
> 
> I think Windows actually never runs SYNC_CACHE unless you select "detach 
> device".  So if we added the capability for a device_eject() newbus 
> method and the default implementation ran device_shutdown(), then 
> scsi_da(4) could run SYNC_CACHE only from its shutdown method and thus 
> it wouldn't matter if the device hung from it.  Right now, we run 
> SYNC_CACHE from daclose() and so umounting the drive is enough to cause 
> a hang, and the hangs on boot are from GEOM tasting the drive 
> (daopen/daclose).  With this change, a device could be plugged in and 
> mounted/umounted multiple times.  Only when the user said "about to 
> eject" would it run SYNC_CACHE.  The only limitation is that after 
> running "eject", the device would have to be unplugged and replugged 
> before it could be mounted again.  But that's expected behavior.
> 
> Combine this with the write cache bit detection and you have a robust 
> solution.  Comments?
> 

What you describe is exactly my intention.  I didn't mean to imply that
a new XPORT becomes the dumping ground for the quirk table.

Btw, for the record, your assumption about SYNC_CACHE also applies to
RAID controllers, which is why Pawel's BIO_FLUSH hack is so dangerous
and wrong.

Scott