Re: USB Disk Stalls on -current

From: Mehmet Erol Sanliturk <m.e.sanliturk_at_gmail.com>
Date: Sun, 06 Feb 2022 20:05:14 UTC
On Sun, Feb 6, 2022 at 10:11 PM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Sun, Feb 6, 2022 at 12:02 PM Sean Bruno <sbruno@freebsd.org> wrote:
>
>>
>>
>> >
>> >
>> > So there's some tools you can use. For usb, there's usbdump that can
>> > get you the USB transactions. I've not used it enough to give more
>> details
>> > here. This will let you know what's going on, and when, on the USB
>> endpoint.
>> >
>> > You can also enable the CAM_IOSCHED stuff. This will allow you to get
>> > latency
>> > measurements for 'requests in the sim' which basically will tell you
>> > what your
>> > latency spread is for the drives. This will tell you if things are
>> > getting caught
>> > up in the USB layer, or after CAM's da driver completes the I/O request
>> > (granted, that's almost certainly not happening, but it will help you
>> > figure out
>> > what's going on and put numbers to the oddities you are seeing).
>> >
>> > Also, make sure you have good cables. I've had lots of hicups over the
>> > years from dodgy USB cables. Also make sure you have good, high quality
>> > enclosures. Many from the USB2 time-period are sketchy at best and I
>> > went through several at one point trying to find a good one. I'd be
>> > tempted to
>> > get USB 3 enclosures. I've had better luck with USB3 gear than USB2 gear
>> > here, but you need a USB-3 controller to get USB-3 speeds which might
>> not
>> > be compatible with the NUC's built-in stuff (though my NUC has one USB3
>> > port, there's lots of different models).
>> >
>> > Usually, though, I see weirdness associated with dmesg messages from
>> > usb, cam, etc when the hardware is on the sketch end.
>> >
>> > Warner
>>
>> I'm assuming that I have a fairly dodgy USB device, as the pauses seem
>> to correspond to this from CAM being emitted:
>>
>> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28
>> 00 36 69 02 6e 00 00 80 00
>> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB
>> request completed with an error
>> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command,
>> 2 more tries remain
>>
>>
>> Things resume after this is emitted, but there is a substantial
>> (multiple minutes) pause here.  I would assume that timeouts would fire
>> much quicker.
>>
>
> The default timeout is 60s.
>
> You can reduce that substantially by setting kern.cam.da.default_timeout
> to a smaller level. Disk operations completed within 5s these days,
> except spin ups. Heck, nearly all complete within 500ms. You
> might try setting this value to maybe 3 or 5 or 10 to see if that helps the
> hiccups without introducing extra retries when the load is heavy. The
> smaller values give a faster recovery, but too small a number may result
> in timeouts and errors under load. I think you need to set this as a
> tuneable.
>
> Warner
>



Are your external disks  "GREEN" , i.e. ,  "energy saver" kind .

If the external disks are energy saver kind , they will start to sleep when
they are not
used for a while , and waking them up will take time which causes
significant distress ,
because to use them requires waiting every such wake up  .

At that point another important trouble is slowness of USB external disks
with respect to internal ( non-energy saver ) SATA disks .

When response time is important , it is necessary to avoid such "GREEN"
disks .



Mehmet Erol Sanliturk