Some questions about da0 on USB2 (recent bad behaviour)

Sat Oct 24 03:58:17 UTC 2009

Hi there,

I have a system with a couple of Western Digital "MyBook"
USB2 drives connected to it, and have started seeing some odd
behaviour that I am not sure how to identify the cause of.
Perhaps someone could offer a suggestion or two?

The behaviour that I've noticed (and I can't remember any
particular event precipitating this, but I do track 8-STABLE
approximately weekly, so things do change from time to time...)
is that the drive will just stop for a couple of minutes, and
then continue what it was doing.  For a while that's all I
could see: no error messages at all.  The last time I booted, I
turned on verbose booting and now I see that these periods of
inactivity result in streams of syslog messages like:

Oct 24 12:48:55 duncan kernel: (da0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
Oct 24 12:48:55 duncan kernel: (da0:umass-sim0:0:0:0): Retrying Command
Oct 24 12:50:24 duncan kernel: (da0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
Oct 24 12:50:24 duncan kernel: (da0:umass-sim0:0:0:0): Retrying Command

The retry seems to be successful, because I'm not getting any
hard error messages anywhere, and the disk activity does seem to
proceed afterwards.  The disk drive isn't making any bad/broken
noises, either.

That drive is, according to dmesg.boot:

ugen1.2: <Western Digital> at usbus1
umass0: <Western Digital External HDD, class 0/0, rev 2.00/1.06, addr 2> on usbus1
umass0:  SCSI over Bulk-Only; quirks = 0x0000
Root mount waiting for: usbus1
umass0:1:0:-1: Attached to scbus1
(probe0:umass-sim0:0:0:0): Down reving Protocol Version from 2 to 0?
pass0 at umass-sim0 bus 0 target 0 lun 0
pass0: <WD 7500AAK External 1.06> Fixed Direct Access SCSI-0 device 
pass0: 40.000MB/s transfers
GEOM: new disk da0
da0 at umass-sim0 bus 0 target 0 lun 0
da0: <WD 7500AAK External 1.06> Fixed Direct Access SCSI-0 device 
da0: 40.000MB/s transfers
da0: 715404MB (1465149168 512 byte sectors: 255H 63S/T 91201C)

What is the likelihood that these pauses and command retries are
a sign that this specific drive is in the process of dying,
physically?  If that were the case, are there any diagnostic
tools that I could run against it to show, say, internal error
logs?

What is the significance of the "sim" part of the device
designation umass-sim0?  I've looked in all of the manual pages
I can think of, but that clearly isn't enough.

usbdevs -v says "no USB controllers found", which I thought a
bit unuseful.  I assume it is *supposed* to work, is there a
trick?  usbconfig shows my connected USB devices and hubs:

ugen0.1: <OHCI root HUB nVidia> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON
ugen1.1: <EHCI root HUB nVidia> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON
ugen1.2: <External HDD Western Digital> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON
ugen1.3: <My Book Western Digital> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON
ugen0.2: <USB-PS/2 Trackball Logitech> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON
ugen0.3: <SNAPSCAN e20 AGFA> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON

At this rate the dump/restore backup that I'm running to get the
data off it will take a little over a day to finish (according
to dump), even though systat shows the drive doing about 8MB/s
while it's working, which would allow the dump to finish in
about eight hours.  These modern, large drives are all very
well, but they make doing any kind of system reconfiguration or
backup really time consuming...

Cheers,

-- 
Andrew