[Bug 261324] xhci1 resets on ROCKPro64 when reading from ZFS pool
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 19 Jan 2022 01:08:07 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261324 Bug ID: 261324 Summary: xhci1 resets on ROCKPro64 when reading from ZFS pool Product: Base System Version: 13.0-STABLE Hardware: arm64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: arm Assignee: freebsd-arm@FreeBSD.org Reporter: freebsd@toyingwithfate.com I have a ROCKPro64 single board computer with 4GB of RAM that I am attempting to use as a small NAS. The storage consists of an encrypted (OpenZFS native, not GELI) RAID-Z pool using three 8TB SATA spinning disks and one 512GB SATA SSD (used as cache) in a simple 4-disk JBOD USB enclosure (the "OWC Mercury Elite Pro Quad"). My understanding is that it contains a USB 3 hub with four SATA-to-USB bridges in it. The issue I run into is that after reading ~40GB of data from the ZFS filesystem, disk I/O stops and I get the following messages in dmesg: xhci1: Resetting controller uhub4: at usbus5, port 1, addr 1 (disconnected) ugen5.7: <OWC Mercury Elite Pro Quad> at usbus5 (disconnected) uhub7: at uhub4, port 1, addr 6 (disconnected) uhub7: detached ugen5.2: <OWC Mercury Elite Pro Quad> at usbus5 (disconnected) uhub6: at uhub4, port 2, addr 1 (disconnected) ugen5.3: <OWC Mercury Elite Pro Quad A> at usbus5 (disconnected) (da0:umass-sim0:0:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5c f8 00 00 00 80 00 00 umass0: at uhub6, port 1, addr 2 (disconnected) (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da0:umass-sim0:0:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5c f8 00 00 00 80 00 00 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da0:umass-sim0:0:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5c f8 00 00 00 80 00 00 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: <Mercury Elite Pro Quad A 0> s/n 123456789004 detached (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 00 80 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 00 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 3 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 3 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 00 80 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 00 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 2 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 2 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 00 80 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 00 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 1 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 1 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 00 80 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 00 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 0 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 0 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 00 80 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Error 5, Retries exhausted (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 00 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Error 5, Retries exhausted (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Error 5, Retries exhausted (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 08 00 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 01 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 3 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 3 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 08 00 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 01 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 2 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 2 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 08 00 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 01 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 1 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 1 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 08 00 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 01 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 0 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 0 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 08 00 00 00 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Error 5, Retries exhausted Jan 1 05:47:13 generic ZFS[5395]: vdev probe failure, zpool=zraid-v5 path=/dev/da1 Jan 1 05:47:13 generic ZFS[5399]: vdev probe failure, zpool=zraid-v5 path=/dev/da1 (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 01 80 00 00 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Error 5, Retries exhausted Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O failure and has been suspended. Jan 1 05:47:14 generic ZFS[5423]: vdev probe failure, zpool=zraid-v5 path=/dev/da2 Jan 1 05:47:14 generic ZFS[5447]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5495]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5499]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5503]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5507]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5511]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5515]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5519]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5523]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5527]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5531]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5535]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5539]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5543]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5547]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5551]: catastrophic pool I/O failure, zpool=zraid-v5 Jan 1 05:47:15 generic ZFS[5555]: catastrophic pool I/O failure, zpool=zraid-v5 (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Error 5, Retries exhausted ...and so on, including further complaints from Solaris, ZFS, and CAM. The system is still operable, and I can SSH in, but the ZFS pool is unresponsive. I can tell FreeBSD to shut down, and although it will start to do so (disconnecting me from SSH), it will not complete powering off: SSH connections are refused and it still responds to pings. If I unplug the SBC's power, disconnect the enclosure's USB cable (since the ROCKPro64 won't boot with it connected, another issue I'll report separately), then boot the SBC and reconnect the USB, things will start working again. (The ZFS pool will sometimes complain about errors that I need to clear, but if I scrub the pool it confirms that no data was corrupted.) Interestingly, I seem to be able to scrub the pool fine: despite the pool having several TB of data, the issue hasn't appeared then. Additionally, I can use dd to read from the individual block devices for many hours without issue, and can even do so from all the disks simultaneously (averaging ~100MB/s per disk, so ~400MB/s total) without seeing the issue. But if I mount the ZFS filesystem (I do so read-only during my tests to try to reduce risk to the data) and start making a tarball of said filesystem (writing said tar stream to /dev/null), it gets about 40GB in and then the issue appears. The exact amount of data it's able to read (as measured by piping the stream through pv) varies a bit: my last three tests have failed at 43.3, 40.6, and 54.2 GiB, respectively. The issue is 100% repeatable: although the amount of data it's able to read before failing varies, it always fails. I've tried both the FreeBSD-13.0-RELEASE-arm64-aarch64-ROCKPRO64.img.xz and FreeBSD-13.0-STABLE-arm64-aarch64-ROCKPRO64-20211230-3684bb89d52-248759.img images, but they both have the issue. I've tried Linux on the same SBC with the same ZFS pool, and it had no issue. I tried a Raspberry Pi 4 running FreeBSD 13.0 RELEASE with the same ZFS pool, and it had no issue. I should note that both the Linux and Raspberry Pi tests lacked hardware crypto support (Linux because Linux apparently doesn't support ARM crypto in ZFS yet, and Raspberry Pi because those have no crypto hardware), so they both relied on software crypto which constrained performance to tens of MB/s, rather than hundreds as FreeBSD on the ROCKPro64 does in my problem case. To that end I also used the ZFS pool with an Intel system running FreeBSD 13, which has working hardware crypto and it was able to work with the disks even faster than the ROCKPro64, with no issues. So my suspicion is the enclosure is not the culprit, since it worked fine on RPi and Intel systems. I also suspect the SBC itself is not the culprit since it worked fine with Linux, though it's possible the issue only shows up when reading from the disks at a particular speed and Linux's constrained crypto performance just didn't reach that speed. I'm also able to work with other disks just fine with FreeBSD on the ROCKPro64: it's only this multi-disk enclosure that has trouble. I can create an encrypted ZFS pool on a USB 3 SSD connected directly to the SBC (no intermediary hub) and it works fine, no trouble at all. Initially I suspected that my SBC or enclosure might be at fault, but through this testing I've come to instead suspect there is some sort of issue in the xhci driver for the ROCKPro64, perhaps somehow related to the enclosure having a USB hub in front of the disks. But I am only speculating! Could someone help me troubleshoot this issue further, to see why xhci1 is resetting unexpectedly? Thank you for your help! -- You are receiving this mail because: You are the assignee for the bug.