Re: xhci USB transaction error and subsequent recovery mechanism on Freebsd stable/12

From: Poul-Henning Kamp <phk_at_phk.freebsd.dk>
Date: Tue, 12 Apr 2022 21:47:34 UTC
--------
mahesh mv writes:

> READ(10) errors. The READ(10) error recovers with in couple of retries most
> of the times but few cases we have observed that the read retries gets exhausted and [...]

About once a week I see the same general phenomena on a RockPro64 running 13.0-RELEASE-p6.

After "usbconfig -d 5.7 reset" and "zpool clear" the ZFS mirror is back in operation.

The other drive in the mirror, (-d 5.4) never does this.

This server has a twin ("keith") with different USB disks, never seen it there either.

Despite spending a fair amount of time on it, I have never been
able to find any hints of hardware trouble (ie: USB drive, USB cables
or the industrial-grade powered USB-hub.)

Usbconfig on the system in question:

	ugen0.1: <Generic EHCI root HUB> at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
	ugen2.1: <Generic EHCI root HUB> at usbus2, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA)
	ugen1.1: <Generic OHCI root HUB> at usbus1, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
	ugen5.1: <Synopsys XHCI root HUB> at usbus5, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
	ugen4.1: <Synopsys XHCI root HUB> at usbus4, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)
	ugen3.1: <Generic OHCI root HUB> at usbus3, cfg=0 md=HOST spd=FULL (12Mbps) pwr=SAVE (0mA)
	ugen5.2: <vendor 0x1a40 USB 2.0 Hub MTT> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA)
	ugen5.3: <FTDI TTL-234X-5V> at usbus5, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (90mA)
	ugen5.4: <Seagate Expansion> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (500mA)
	ugen5.5: <FTDI USB-RS485 Cable> at usbus5, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (300mA)
	ugen5.6: <ASIX Elec. Corp. AX88178> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (450mA)
	ugen5.7: <LaCie Rugged USB-C> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (500mA)

Most recent event was yesterday:

	Apr 11 03:01:04 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 08 42 8f 60 00 00 08 00 
	Apr 11 03:01:04 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:04 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain
	Apr 11 03:01:04 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 08 42 8f 60 00 00 08 00 
	Apr 11 03:01:04 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:04 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain
	Apr 11 03:01:05 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 08 42 8f 60 00 00 08 00 
	Apr 11 03:01:05 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:05 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain
	Apr 11 03:01:05 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 08 42 8f 60 00 00 08 00 
	Apr 11 03:01:05 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:05 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain
	Apr 11 03:01:06 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 08 42 8f 60 00 00 08 00 
	Apr 11 03:01:06 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:06 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Error 5, Retries exhausted
	Apr 11 03:01:06 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:06 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:06 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain
	Apr 11 03:01:07 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:07 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:07 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain
	Apr 11 03:01:07 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:07 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:07 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain
	Apr 11 03:01:08 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:08 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:08 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain
	Apr 11 03:01:08 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:08 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:08 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Error 5, Retries exhausted
	Apr 11 03:01:09 <23.4> mick ZFS[48383]: vdev I/O failure, zpool=mick path=/dev/da1 offset=4000786423808 size=8192 error=5
	Apr 11 03:01:09 <23.4> mick ZFS[48387]: vdev I/O failure, zpool=mick path=/dev/da1 offset=4000786685952 size=8192 error=5
	Apr 11 03:01:09 <23.4> mick ZFS[48391]: vdev I/O failure, zpool=mick path=/dev/da1 offset=270336 size=8192 error=5
	Apr 11 03:01:09 <23.3> mick ZFS[48395]: vdev probe failure, zpool=mick path=/dev/da1
	Apr 11 03:01:09 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:09 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:09 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain
	Apr 11 03:01:10 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:10 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:10 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain
	Apr 11 03:01:10 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:10 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:10 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain
	Apr 11 03:01:11 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:11 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:11 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain
	Apr 11 03:01:11 <0.2> mick kernel: (da1:umass-sim1:1:0:0): READ(10). CDB: 28 00 00 00 02 10 00 00 10 00 
	Apr 11 03:01:11 <0.2> mick kernel: (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error
	Apr 11 03:01:11 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Error 5, Retries exhausted
	Apr 11 03:01:11 <23.3> mick ZFS[48399]: vdev probe failure, zpool=mick path=/dev/da1
	Apr 11 03:01:11 <23.5> mick ZFS[48403]: vdev state changed, pool_guid=7854867548980906247 vdev_guid=14874313982910104056
	Apr 11 03:01:16 <0.2> mick kernel: (da1:umass-sim1:1:0:0): got CAM status 0x44
	Apr 11 03:01:16 <0.2> mick kernel: (da1:umass-sim1:1:0:0): fatal error, failed to attach to device
	Apr 11 03:01:16 <0.2> mick kernel: da1 at umass-sim1 bus 1 scbus1 target 0 lun 0
	Apr 11 03:01:16 <0.2> mick kernel: da1: <LaCie Rugged USB-C 1153>  s/n 0000NL6AE1LV detached
	Apr 11 03:01:16 <0.2> mick kernel: (da1:umass-sim1:1:0:0): Periph destroyed

Poul-Henning

PS: "completed with an error" without any details about the error is not very helpful for debugging...

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.