From nobody Fri Apr 01 05:45:42 2022 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DB3D51A5BE45 for ; Fri, 1 Apr 2022 05:45:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KV8KL3ksZz4Sbb for ; Fri, 1 Apr 2022 05:45:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 603F66853 for ; Fri, 1 Apr 2022 05:45:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 2315jgUn041776 for ; Fri, 1 Apr 2022 05:45:42 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 2315jgRK041775 for bugs@FreeBSD.org; Fri, 1 Apr 2022 05:45:42 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug Date: Fri, 01 Apr 2022 05:45:42 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: ibrennan@netgrade.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1648791942; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=hoeVMNxral3PNBdpUfp9o7P5c6BkVvEj+Q6Kg5nf02o=; b=q9UoDu66I0OGuCqVENwTmA1SZC/vPZTnifa4qB97989miP9f5nwpKb+4YOE9cEa0K10fMs vV1nnR17HMA+hTkFkX7Y5fqj8zTuNoEoQ4EoEnzo8F3HAOK01Bb8p0VEkKJmAHxrdM50kn OExrByHKkofRO+FS3qrEuKMMyrHA5HwV5BZS3SxthFOEx8dos0WHPLiMRPfpcfnYKzVtZo EUbhq1I/GohQk/ON+42Rj625iZAwbNeDYjfCHCzgoNoktHKkW17PlhlrbLpElTDnzDiI/D 9ZBwC9O+CK/Gp/KYG+gneYBq9u16LVUht3NGYR7Pl/ZEWqpKvecoMj+Fr2DHNg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1648791942; a=rsa-sha256; cv=none; b=XbjFRkC+aWBnHf30YSzph567QOGYgq2R0ecoonuNwUEcn7hTzFg++Gg+ZOmRt5/k9xiinN jIX5NRZtQewjAv5a4Kzu2Ze0sX0XSpddIedmGK7ai8eKrDYttrzE080CbptiW+xKPT/ZlW Fch32DckggmDTpT/ZnX6Lu8/EHhazvlK01LygTfDTE2x8SOf6ZYyVZZLDz+2uniq6MjyYW qGDCq/lOwlqaO0HLhvvx3jOJrGAvTfBcWABm1c8rnPq15GqS3FQv/3zibs5o3iMXv8JMSu zU+RxEJz8JqWL908pGPgIH1jbfqEC3Iqqafh5i3DJip6nO3aZymGo21mbmnlgw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D262969 Bug ID: 262969 Summary: NVMe - Resetting controller due to a timeout and possible hot unplug Product: Base System Version: 13.1-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ibrennan@netgrade.com I am seeing very unstable NVMe on TrueNAS 12 and 13. I'm using Western Dig= ital Black NVMe m.2 SSD's. The exact version I'm using is WD_BLACK SN770 250GB & 2TB, firmware version 731030WD. This is on TrueNAS version: TrueNAS-12.0-U8 / FreeBSD: 12.2-RELEASE-p12 (I later upgraded to 13 Beta, see below) I set hw.nvme.per_cpu_io_queues=3D0, and it did not fix the problem, in fac= t it seems to have made it much more frequent, although I'm not 100% sure about that, need to test again. I also tried using the nvd driver with hw.nvme.use_nvd=3D0, which doesn't s= eem to make a difference, however it had slightly different results in the log when the issue happened again. See logs below, would be grateful if somebody can help with this problem. Mar 29 21:42:25 truenas nvme5: Resetting controller due to a timeout and possible hot unplug. Mar 29 21:42:25 truenas nvme5: resetting controller Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:120 nsid:1 lba:1497544880 len:16 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:120 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:123 nsid:1 lba:198272936 le= n:16 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:123 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:13 cid:121 nsid:1 lba:431014528 le= n:24 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:13 cid:121 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:15 cid:127 nsid:1 lba:864636432 le= n:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:15 cid:127 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:126 nsid:1 lba:2445612184 l= en:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:126 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:120 nsid:1 lba:430503600 le= n:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:120 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:18 cid:123 nsid:1 lba:1499051024 l= en:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:123 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: WRITE sqid:18 cid:124 nsid:1 lba:1990077368 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:124 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:122 nsid:1 lba:1237765696 l= en:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:122 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:125 nsid:1 lba:180758264 le= n:16 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:125 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:20 cid:121 nsid:1 lba:2445612192 l= en:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:20 cid:121 cdw0:0 Mar 29 21:42:26 truenas nvd5: detached nvme3: Resetting controller due to a timeout and possible hot unplug. nvme3: resetting controller nvme3: failing outstanding i/o nvme3: READ sqid:7 cid:127 nsid:1 lba:419546528 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:7 cid:127 cdw0:0 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 pr= p2=3D0 cdw=3D1901c5a0 0 7 0 0 0 failing outstanding i/o (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted nvme3: READ sqid:11 cid:127 nsid:1 lba:782841288 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:127 cdw0:0 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 pr= p2=3D0 cdw=3D2ea935c8 0 7 0 0 0 failing outstanding i/o nvme3: READ sqid:11 cid:123 nsid:1 lba:704576056 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:123 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:12 cid:127 nsid:1 lba:1016402352 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:12 cid:125 nsid:1 lba:1824854760 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:125 cdw0:0 nvme3: failing outstanding i/o nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted WRITE sqid:13 cid:124 nsid:1 lba:1008638008 len:64 nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:124 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:13 cid:125 nsid:1 lba:1008638152 len:56 nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:125 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:15 cid:127 nsid:1 lba:783188688 len:8 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 pr= p2=3D0 cdw=3D29fefa38 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=3D1 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0= cdw=3D3c9511b0 0 7 0 0 0 ABORTED - BY REQUEST (00/07) sqid:15 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:15 cid:123 nsid:1 lba:1008553080 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:15 cid:123 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:16 cid:124 nsid:1 lba:147012776 len:8 nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 = cdw=3D6cc512e8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=3D1 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0= cdw=3D3c1e9838 0 3f 0 0 0 ABORTED - BY REQUEST (00/07) sqid:16 cid:124 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:16 cid:127 nsid:1 lba:2881895592 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:16 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:17 cid:127 nsid:1 lba:2574392744 len:16 nvme3: ABORTED - BY REQUEST (00/07) sqid:17 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:18 cid:126 nsid:1 lba:155895056 len:8 nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=3D1 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0= cdw=3D3c1e98c8 0 37 0 0 0 ABORTED - BY REQUEST (00/07) sqid:18 cid:126 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:19 cid:125 nsid:1 lba:151377120 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:19 cid:125 cdw0:0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 = cdw=3D2eae82d0 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=3D1 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0= cdw=3D3c1d4c78 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 = cdw=3D8c33ca8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 = cdw=3Dabc63ca8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 = cdw=3D99721da8 0 f 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 = cdw=3D94ac510 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=3D2 fuse=3D0 nsid=3D1 prp1=3D0 prp2=3D0 = cdw=3D905d4e0 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted nda3 at nvme3 bus 0 scbus13 target 0 lun 1 nda3: s/n 21513C800057 detached xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file After Upgrading to 13.0-STABLE I see this: nvme4: RECOVERY_START 166765538822 vs 166329346273 nvme4: Controller in fatal status, resetting nvme4: Resetting controller due to a timeout and possible hot unplug. nvme4: RECOVERY_WAITING nvme4: resetting controller nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:127 nsid:1 lba:32 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:127 cdw0:0 nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:126 nsid:1 lba:544 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:126 cdw0:0 nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:125 nsid:1 lba:3907028000 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:125 cdw0:0 nvme4: failing outstanding i/o nvme4: READ sqid:18 cid:124 nsid:1 lba:3907028512 len:224 nvme4: ABORTED - BY REQUEST (00/07) sqid:18 cid:124 cdw0:0 nvd4: detached --=20 You are receiving this mail because: You are the assignee for the bug.=