[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 01 Apr 2022 05:45:42 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262969 Bug ID: 262969 Summary: NVMe - Resetting controller due to a timeout and possible hot unplug Product: Base System Version: 13.1-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ibrennan@netgrade.com I am seeing very unstable NVMe on TrueNAS 12 and 13. I'm using Western Digital Black NVMe m.2 SSD's. The exact version I'm using is WD_BLACK SN770 250GB & 2TB, firmware version 731030WD. This is on TrueNAS version: TrueNAS-12.0-U8 / FreeBSD: 12.2-RELEASE-p12 (I later upgraded to 13 Beta, see below) I set hw.nvme.per_cpu_io_queues=0, and it did not fix the problem, in fact it seems to have made it much more frequent, although I'm not 100% sure about that, need to test again. I also tried using the nvd driver with hw.nvme.use_nvd=0, which doesn't seem to make a difference, however it had slightly different results in the log when the issue happened again. See logs below, would be grateful if somebody can help with this problem. Mar 29 21:42:25 truenas nvme5: Resetting controller due to a timeout and possible hot unplug. Mar 29 21:42:25 truenas nvme5: resetting controller Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:120 nsid:1 lba:1497544880 len:16 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:120 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:123 nsid:1 lba:198272936 len:16 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:123 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:13 cid:121 nsid:1 lba:431014528 len:24 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:13 cid:121 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:15 cid:127 nsid:1 lba:864636432 len:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:15 cid:127 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:126 nsid:1 lba:2445612184 len:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:126 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:120 nsid:1 lba:430503600 len:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:120 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:18 cid:123 nsid:1 lba:1499051024 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:123 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: WRITE sqid:18 cid:124 nsid:1 lba:1990077368 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:124 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:122 nsid:1 lba:1237765696 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:122 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:125 nsid:1 lba:180758264 len:16 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:125 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:20 cid:121 nsid:1 lba:2445612192 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:20 cid:121 cdw0:0 Mar 29 21:42:26 truenas nvd5: detached nvme3: Resetting controller due to a timeout and possible hot unplug. nvme3: resetting controller nvme3: failing outstanding i/o nvme3: READ sqid:7 cid:127 nsid:1 lba:419546528 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:7 cid:127 cdw0:0 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=1901c5a0 0 7 0 0 0 failing outstanding i/o (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted nvme3: READ sqid:11 cid:127 nsid:1 lba:782841288 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:127 cdw0:0 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=2ea935c8 0 7 0 0 0 failing outstanding i/o nvme3: READ sqid:11 cid:123 nsid:1 lba:704576056 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:123 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:12 cid:127 nsid:1 lba:1016402352 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:12 cid:125 nsid:1 lba:1824854760 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:125 cdw0:0 nvme3: failing outstanding i/o nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted WRITE sqid:13 cid:124 nsid:1 lba:1008638008 len:64 nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:124 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:13 cid:125 nsid:1 lba:1008638152 len:56 nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:125 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:15 cid:127 nsid:1 lba:783188688 len:8 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=29fefa38 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c9511b0 0 7 0 0 0 ABORTED - BY REQUEST (00/07) sqid:15 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:15 cid:123 nsid:1 lba:1008553080 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:15 cid:123 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:16 cid:124 nsid:1 lba:147012776 len:8 nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=6cc512e8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1e9838 0 3f 0 0 0 ABORTED - BY REQUEST (00/07) sqid:16 cid:124 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:16 cid:127 nsid:1 lba:2881895592 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:16 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:17 cid:127 nsid:1 lba:2574392744 len:16 nvme3: ABORTED - BY REQUEST (00/07) sqid:17 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:18 cid:126 nsid:1 lba:155895056 len:8 nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1e98c8 0 37 0 0 0 ABORTED - BY REQUEST (00/07) sqid:18 cid:126 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:19 cid:125 nsid:1 lba:151377120 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:19 cid:125 cdw0:0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=2eae82d0 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1d4c78 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=8c33ca8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=abc63ca8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=99721da8 0 f 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=94ac510 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=905d4e0 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted nda3 at nvme3 bus 0 scbus13 target 0 lun 1 nda3: <WD_BLACK SN770 2TB 731030WD 21513C800057> s/n 21513C800057 detached xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file After Upgrading to 13.0-STABLE I see this: nvme4: RECOVERY_START 166765538822 vs 166329346273 nvme4: Controller in fatal status, resetting nvme4: Resetting controller due to a timeout and possible hot unplug. nvme4: RECOVERY_WAITING nvme4: resetting controller nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:127 nsid:1 lba:32 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:127 cdw0:0 nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:126 nsid:1 lba:544 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:126 cdw0:0 nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:125 nsid:1 lba:3907028000 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:125 cdw0:0 nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:124 nsid:1 lba:3907028512 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:124 cdw0:0 nvd4: detached -- You are receiving this mail because: You are the assignee for the bug.