From nobody Sun Sep 17 15:21:28 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RpWqT5hSVz4t42h for ; Sun, 17 Sep 2023 15:21:41 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RpWqT43j2z4PH6 for ; Sun, 17 Sep 2023 15:21:41 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-9adb9fa7200so732696066b.0 for ; Sun, 17 Sep 2023 08:21:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1694964100; x=1695568900; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=b0qnDlGaCKjud7ojWbQI0q5nseDEZY29BGmYG5+BfU0=; b=2L2QYC2T6Ns1K2JeWm4WOpUDF+3N/KepZlHLgLTtk4lyJBcAmOTU0IGnQ/v+bTjvpd n7qEvqaPvpyuet7J1zGP1soJhi85LnRk+a/i0WZu+7tU0GcDCFnuxnD23sDwmeYVBseF g2mAQB29vu4g5jB/VWQcA2YVEojaSHLO6CdzNePeZJkRedUs96YV4Smcq3wsRURkvbjU 5SoejWn6z/uvmtFXdsm+4z8kUM6VChYoos+tsWi7m/qKIZcRQQthQXNFmz1uWM955apk 8uxR+DM7U7DjjFdEJSM1vy32LBi232jCjque7XI+3RHtQy9gbCCK6cniYoaM1c/OR2j0 7Q5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694964100; x=1695568900; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=b0qnDlGaCKjud7ojWbQI0q5nseDEZY29BGmYG5+BfU0=; b=D8klPkVrXQeUp3UU5m5brQTLUlnB2lKGHKkVd0yrNQiToOe8AnujvXtOsDZ1bel/L7 bIATQP2jRznZQjR/gKYyJqRGUHRBz7+0tfv/gJzVQZ2O6EmFM4Uvx+w43AuSDzWqCluq HGVdkcJrdMONQod1Lm4kNh3GgHJwX0IIwn6TaKmy504H4FoEdQQtDIK88m3cIRjEGmiS w5YN/kxrWvw1GAndIJI9R7yY5tLVWqluB8DDM03lMKeJHZnZkcjrvWoJvig51zaqcVCL APWQ1rhKOYVkNYn0JMFnQMSoNQIpVmiwrbkYgGD3+9ch+l8jJG1hDo0LF1Z67Ph7rOtK W3Bw== X-Gm-Message-State: AOJu0YzPrCahA5qLJ39yDxnRr6q4bpOSDOPaHaByyBkpoD7FMe2PVM6O 8SSSZICh5k4BoOYsMnP6Qc3h/hxYbvxBj4+Y9yI9ow== X-Google-Smtp-Source: AGHT+IHsr0egGU/Zp7gwu+3U+9GD5/4HnCk2ETS1UTsKEdhjOJ7FswfbK53lbQxiHuE2rYwiHHslQ3PVegYzMUYMACU= X-Received: by 2002:a17:907:3e91:b0:9aa:206d:b052 with SMTP id hs17-20020a1709073e9100b009aa206db052mr14057731ejc.27.1694964099586; Sun, 17 Sep 2023 08:21:39 -0700 (PDT) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 References: <202309171516.38HFGT2g089933@gitrepo.freebsd.org> In-Reply-To: <202309171516.38HFGT2g089933@gitrepo.freebsd.org> From: Warner Losh Date: Sun, 17 Sep 2023 16:21:28 +0100 Message-ID: Subject: Re: git: d95431624f93 - main - nvme: Give up when we've failed To: Warner Losh Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="0000000000001030d506058f954c" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4RpWqT43j2z4PH6 --0000000000001030d506058f954c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Sep 17, 2023 at 4:16=E2=80=AFPM Warner Losh wrote= : > The branch main has been updated by imp: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=3Dd95431624f934fe4740211738fc7878= 08005b14e > > commit d95431624f934fe4740211738fc787808005b14e > Author: Warner Losh > AuthorDate: 2023-09-15 16:02:32 +0000 > Commit: Warner Losh > CommitDate: 2023-09-17 15:11:56 +0000 > > nvme: Give up when we've failed > > Normally, we poll the device every so often to see if commands have > timed out. However, we'll go into the recovery state as part of faili= ng > the drive. To account for all possibilties, if we're failed when we g= et > into the polling function, just stop polling: Party is over. > > Sponsored by: Netflix > Reviewed by: jtl@ Differential Revision: https://reviews.freebsd.org/D41877 > --- > sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c > index b256c4713c8d..4e37aa0e1020 100644 > --- a/sys/dev/nvme/nvme_qpair.c > +++ b/sys/dev/nvme/nvme_qpair.c > @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg) > > mtx_assert(&qpair->recovery, MA_OWNED); > > + /* > + * If the controller has failed, give up. We're never going to > change > + * state from a failed controller: no further transactions are > possible. > + * We go ahead and let the timeout expire in many cases for > simplicity. > + */ > + if (qpair->ctrlr->is_failed) { > + nvme_printf(ctrlr, "Controller failed, giving up\n"); > + qpair->timer_armed =3D false; > + return; > + } > + > switch (qpair->recovery_state) { > case RECOVERY_NONE: > /* > @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg) > idle =3D false; /* We want to keep poll= ing > */ > break; > case RECOVERY_WAITING: > - nvme_printf(ctrlr, "waiting for reset to complete\n"); > - idle =3D false; /* We want to keep poll= ing > */ > + nvme_printf(ctrlr, "Waiting for reset to complete\n"); > + idle =3D false; /* We want to keep polling */ > break; > } > > --0000000000001030d506058f954c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sun, Sep 17, 2023 at 4:16=E2=80=AF= PM Warner Losh <imp@freebsd.org&g= t; wrote:
The br= anch main has been updated by imp:

URL: https://cgit.= FreeBSD.org/src/commit/?id=3Dd95431624f934fe4740211738fc787808005b14e
commit d95431624f934fe4740211738fc787808005b14e
Author:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org>
AuthorDate: 2023-09-15 16:02:32 +0000
Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org>
CommitDate: 2023-09-17 15:11:56 +0000

=C2=A0 =C2=A0 nvme: Give up when we've failed

=C2=A0 =C2=A0 Normally, we poll the device every so often to see if command= s have
=C2=A0 =C2=A0 timed out. However, we'll go into the recovery state as p= art of failing
=C2=A0 =C2=A0 the drive. To account for all possibilties, if we're fail= ed when we get
=C2=A0 =C2=A0 into the polling function, just stop polling: Party is over.<= br>
=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Netflix=

Reviewed by: jtl@
Differenti= al Revision:=C2=A0https://re= views.freebsd.org/D41877
=C2=A0
---
=C2=A0sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--
=C2=A01 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c
index b256c4713c8d..4e37aa0e1020 100644
--- a/sys/dev/nvme/nvme_qpair.c
+++ b/sys/dev/nvme/nvme_qpair.c
@@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 mtx_assert(&qpair->recovery, MA_OWNED);<= br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * If the controller has failed, give up. We= 9;re never going to change
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * state from a failed controller: no further t= ransactions are possible.
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * We go ahead and let the timeout expire in ma= ny cases for simplicity.
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (qpair->ctrlr->is_failed) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, = "Controller failed, giving up\n");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0qpair->timer_arm= ed =3D false;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
=C2=A0 =C2=A0 =C2=A0 =C2=A0 switch (qpair->recovery_state) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 case RECOVERY_NONE:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
@@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 idle =3D false;=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We wan= t to keep polling */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 case RECOVERY_WAITING:
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, = "waiting for reset to complete\n");
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0idle =3D false;=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We wan= t to keep polling */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nvme_printf(ctrlr, = "Waiting for reset to complete\n");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0idle =3D false;=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We want to keep polling */
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }

--0000000000001030d506058f954c--