Re: git: d95431624f93 - main - nvme: Give up when we've failed

From: Warner Losh <imp_at_bsdimp.com>
Date: Sun, 17 Sep 2023 15:21:28 UTC
On Sun, Sep 17, 2023 at 4:16 PM Warner Losh <imp@freebsd.org> wrote:

> The branch main has been updated by imp:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=d95431624f934fe4740211738fc787808005b14e
>
> commit d95431624f934fe4740211738fc787808005b14e
> Author:     Warner Losh <imp@FreeBSD.org>
> AuthorDate: 2023-09-15 16:02:32 +0000
> Commit:     Warner Losh <imp@FreeBSD.org>
> CommitDate: 2023-09-17 15:11:56 +0000
>
>     nvme: Give up when we've failed
>
>     Normally, we poll the device every so often to see if commands have
>     timed out. However, we'll go into the recovery state as part of failing
>     the drive. To account for all possibilties, if we're failed when we get
>     into the polling function, just stop polling: Party is over.
>
>     Sponsored by:           Netflix
>

Reviewed by: jtl@
Differential Revision: https://reviews.freebsd.org/D41877


> ---
>  sys/dev/nvme/nvme_qpair.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c
> index b256c4713c8d..4e37aa0e1020 100644
> --- a/sys/dev/nvme/nvme_qpair.c
> +++ b/sys/dev/nvme/nvme_qpair.c
> @@ -1011,6 +1011,17 @@ nvme_qpair_timeout(void *arg)
>
>         mtx_assert(&qpair->recovery, MA_OWNED);
>
> +       /*
> +        * If the controller has failed, give up. We're never going to
> change
> +        * state from a failed controller: no further transactions are
> possible.
> +        * We go ahead and let the timeout expire in many cases for
> simplicity.
> +        */
> +       if (qpair->ctrlr->is_failed) {
> +               nvme_printf(ctrlr, "Controller failed, giving up\n");
> +               qpair->timer_armed = false;
> +               return;
> +       }
> +
>         switch (qpair->recovery_state) {
>         case RECOVERY_NONE:
>                 /*
> @@ -1094,8 +1105,8 @@ nvme_qpair_timeout(void *arg)
>                 idle = false;                   /* We want to keep polling
> */
>                 break;
>         case RECOVERY_WAITING:
> -               nvme_printf(ctrlr, "waiting for reset to complete\n");
> -               idle = false;                   /* We want to keep polling
> */
> +               nvme_printf(ctrlr, "Waiting for reset to complete\n");
> +               idle = false;           /* We want to keep polling */
>                 break;
>         }
>
>