From nobody Tue Jul 23 23:03:40 2024 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WTCPT2ZsBz5R6VS; Tue, 23 Jul 2024 23:03:41 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WTCPT19DKz4byv; Tue, 23 Jul 2024 23:03:41 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1721775821; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ROGzW1NyMKfu55mvfbmTqQEEkjZEuxCXn18sw3pGAAA=; b=CJfnSteNyURHhaHBCNxwLgirv8CFHlP2Vmu+UglV0KHifAEpycT/TDPALqE5r/8BDjEYjZ wIhxwoWZQ3p1ZQ8F3OqaIMt5B8sNkP3Wjwf3n5wtLGvgCKLPD0Au6g/uBm3B6qHBTsISKO Hj/1OPPkRG6nY0LAYeH16Xk2N8cHmcte7q869Cz8vsWMWOb2Q3fKj/tRMHpPbjTP7/lz+P 7CEib6yRe6jM0CeC2fkjoS0S6Lw3unNuew5mHS/ZAxBUeYQstP7akLe6QzozL5FxHJiw3x khPGk7IQ04jtWrHmUMWPyxY/RXJvukn27ShmcJlmsPzrjFKn3O+R0fOLWHeZzw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1721775821; a=rsa-sha256; cv=none; b=OlDbkbAom25A14arrlYsb5/mysuTYxB/oXuJsbjDT0r+ks0BEw6VeZJ+H0tkGzi37fyK1l W2fknKxTwFxBhgTxbOsR0OMcCd9LuXRlJ/GEYbebvggVNjJXjeR6iiPG1pDqBLeoTiXaCi MKjHO3j4PLm+7VuxsnDU9f0GXGGtsBzruw+la2QBmrX4tThro9knQHeFvZtfxdZzYpHiq9 pNCnFEqroxqmj+R9RORZtBDWSav1d9U9BC9STFd4FV9T2jd3VO6qTcoPqcOnONzrrf9nKp pBunJeg89sXUjUoKqM0QVz+rM8lryPVcQfU6xiwgC4kIzOo3Jfv0YBiYtwgYEg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1721775821; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ROGzW1NyMKfu55mvfbmTqQEEkjZEuxCXn18sw3pGAAA=; b=elbTCPGAy/nxv1mZFoPzwVwk4SN33cpmvcm1oWso6OzGdm1ElFdfCPNB7sfn+kQ4GbbgEK 0aJCOMV7hdAELNqdGNcz1vi/aPrD6j9PXXwlFbakPjR9s0MxMi19Ee57QgLHQ+DzgWpesA gTGhhX6L73yc6H21itnemNMEUw4Rx1hYqDMQvAPuSKoxq9oPuT6EGYE1sW0faAdCTTXAoq w8OyePm8tqQLtzeUnvtvWmV9etItxTpTdvf81AbY/OK9JDnB6hWEH3scfzdeOXeE8IjyIh CgP4uhO+hqtJybcPvbXung3RwFGf0rlHvz2cGJLrv56Gt1sr8PwwmgSkf9v4aQ== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4WTCPT0V01zmmS; Tue, 23 Jul 2024 23:03:41 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 46NN3ejA008314; Tue, 23 Jul 2024 23:03:40 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 46NN3eRq008311; Tue, 23 Jul 2024 23:03:40 GMT (envelope-from git) Date: Tue, 23 Jul 2024 23:03:40 GMT Message-Id: <202407232303.46NN3eRq008311@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Warner Losh Subject: git: aa41354349c1 - main - nvme: Optimize timeout code further List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: imp X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: aa41354349c16ea7220893010df78b47d67d0f74 Auto-Submitted: auto-generated The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=aa41354349c16ea7220893010df78b47d67d0f74 commit aa41354349c16ea7220893010df78b47d67d0f74 Author: Warner Losh AuthorDate: 2024-07-23 23:02:03 +0000 Commit: Warner Losh CommitDate: 2024-07-23 23:04:02 +0000 nvme: Optimize timeout code further Optimize timeout code based on three observations. (1) The tr queues are sorted in order of submission, so the first one that could time out is the first "real" one on the list. (2) Timeouts for a given queue are all the same length (well, except at startup, where timeout doesn't matter, and when you change it at runtime, where timeouts will still happen eventually and the difference isn't worth optimizing for). (3) Calling the ISR races the real ISR and we should avoid that better. So now, after checking to see if the card is there and working, the timeout routine scans the pending tracker list until it finds a non-AER tracker. If the deadline hasn't passed, we return, doing nothing further. Otherwise, we call poll completions and then process the list looking for timed out items. This should move the timeout routine to touching hardware only when it's really necessary. It thus avoids racing the normal ISR, while still timig out stuck transactions quickly enough. There was also some minor code motion to make all of the above flow more nicely for the reader. When interrupts aren't working at all, then this will increase latency somewhat. But when interrupts aren't working at all, there's bigger problems and we should poll quite often in that case. That will be handled in future commits. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D46026 --- sys/dev/nvme/nvme_qpair.c | 138 ++++++++++++++++++++++++++++++---------------- 1 file changed, 90 insertions(+), 48 deletions(-) diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c index e4286b61a3fc..c917b34dbe43 100644 --- a/sys/dev/nvme/nvme_qpair.c +++ b/sys/dev/nvme/nvme_qpair.c @@ -1045,8 +1045,8 @@ nvme_qpair_timeout(void *arg) struct nvme_controller *ctrlr = qpair->ctrlr; struct nvme_tracker *tr; sbintime_t now; - bool idle = false; - bool needs_reset; + bool idle = true; + bool fast; uint32_t csts; uint8_t cfs; @@ -1092,23 +1092,69 @@ nvme_qpair_timeout(void *arg) */ csts = nvme_mmio_read_4(ctrlr, csts); cfs = NVMEV(NVME_CSTS_REG_CFS, csts); - if (csts == NVME_GONE || cfs == 1) - goto do_reset; + if (csts == NVME_GONE || cfs == 1) { + /* + * We've had a command timeout that we weren't able to + * abort or we have aborts disabled and any command + * timed out. + * + * If we get here due to a possible surprise hot-unplug + * event, then we let nvme_ctrlr_reset confirm and fail + * the controller. + */ +do_reset: + nvme_printf(ctrlr, "Resetting controller due to a timeout%s.\n", + (csts == 0xffffffff) ? " and possible hot unplug" : + (cfs ? " and fatal error status" : "")); + qpair->recovery_state = RECOVERY_WAITING; + nvme_ctrlr_reset(ctrlr); + idle = false; + break; + } + /* - * Process completions. We already have the recovery lock, so - * call the locked version. + * See if there's any recovery needed. First, do a fast check to + * see if anything could have timed out. If not, then skip + * everything else. + */ + fast = false; + mtx_lock(&qpair->lock); + now = getsbinuptime(); + TAILQ_FOREACH(tr, &qpair->outstanding_tr, tailq) { + /* + * Skip async commands, they are posted to the card for + * an indefinite amount of time and have no deadline. + */ + if (tr->deadline == SBT_MAX) + continue; + + /* + * If the first real transaction is not in timeout, then + * we're done. Otherwise, we try recovery. + */ + idle = false; + if (now <= tr->deadline) + fast = true; + break; + } + mtx_unlock(&qpair->lock); + if (idle || fast) + break; + + /* + * There's a stale transaction at the start of the queue whose + * deadline has passed. Poll the competions as a last-ditch + * effort in case an interrupt has been missed. */ _nvme_qpair_process_completions(qpair); /* - * Check to see if we need to timeout any commands. If we do, then - * we also enter a recovery phase. + * Now that we've run the ISR, re-rheck to see if there's any + * timed out commands and abort them or reset the card if so. */ - now = getsbinuptime(); - needs_reset = false; - idle = true; mtx_lock(&qpair->lock); + idle = true; TAILQ_FOREACH(tr, &qpair->outstanding_tr, tailq) { /* * Skip async commands, they are posted to the card for @@ -1116,48 +1162,44 @@ nvme_qpair_timeout(void *arg) */ if (tr->deadline == SBT_MAX) continue; - if (now > tr->deadline) { - if (tr->req->cb_fn != nvme_abort_complete && - ctrlr->enable_aborts) { - /* - * This isn't an abort command, ask - * for a hardware abort. - */ - nvme_ctrlr_cmd_abort(ctrlr, tr->cid, - qpair->id, nvme_abort_complete, tr); - } else { - /* - * Otherwise we have a live command in - * the card (either one we couldn't - * abort, or aborts weren't enabled). - * The only safe way to proceed is to do - * a reset. - */ - needs_reset = true; - } + + /* + * If we know this tracker hasn't timed out, we also + * know all subsequent ones haven't timed out. The tr + * queue is in submission order and all normal commands + * in a queue have the same timeout (or the timeout was + * changed by the user, but we eventually timeout then). + */ + idle = false; + if (now <= tr->deadline) + break; + + /* + * Timeout expired, abort it or reset controller. + */ + if (ctrlr->enable_aborts && + tr->req->cb_fn != nvme_abort_complete) { + /* + * This isn't an abort command, ask for a + * hardware abort. This goes to the admin + * queue which will reset the card if it + * times out. + */ + nvme_ctrlr_cmd_abort(ctrlr, tr->cid, qpair->id, + nvme_abort_complete, tr); } else { - idle = false; + /* + * We have a live command in the card (either + * one we couldn't abort, or aborts weren't + * enabled). We can only reset. + */ + mtx_unlock(&qpair->lock); + goto do_reset; } } mtx_unlock(&qpair->lock); - if (!needs_reset) - break; - - /* - * We've had a command timeout that we weren't able to abort - * - * If we get here due to a possible surprise hot-unplug event, - * then we let nvme_ctrlr_reset confirm and fail the - * controller. - */ - do_reset: - nvme_printf(ctrlr, "Resetting controller due to a timeout%s.\n", - (csts == 0xffffffff) ? " and possible hot unplug" : - (cfs ? " and fatal error status" : "")); - qpair->recovery_state = RECOVERY_WAITING; - nvme_ctrlr_reset(ctrlr); - idle = false; /* We want to keep polling */ break; + case RECOVERY_WAITING: /* * These messages aren't interesting while we're suspended. We