From nobody Tue Jul 23 23:03:42 2024 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WTCPV3QKpz5R6ZN; Tue, 23 Jul 2024 23:03:42 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WTCPV1zf0z4c32; Tue, 23 Jul 2024 23:03:42 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1721775822; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Fpf7qdIjz8lMhXxk5kea1fEMR3E/dvvmfUr0OqjUvo8=; b=D4ycHmpRkyxJ6XMSfw5tOo1CX4X50YnHjYvSzmJ/cnfrYQzYcbODsIOjshfiFB9y3jw0je zi9avuW54tgkyP/MtDBcpSvISzUDIOFaErMxqPYoCOCwkUiNAvYQH/aoia7wPB1YejnJpt dDEpVi1Q341+/k9AgxB+8NRTEZuQ77+Mcn2t9q2qxu23B7bsCB2HxfecLrr8LQ52GMA4M8 DDnWDDhkvNkcMq/ZhYORf65cH2f4tU3d19SMW85vZzH3u1RPKxjVPkCUtcUBGsBAteGXcm i3aX1XkTr8UZ4qPw7/icuWRAg/jz1TdTpeN1HUaPR/Bhxgxta7Kfm75GPREj+A== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1721775822; a=rsa-sha256; cv=none; b=Zo4tsZST8q0EAVDlxXvEtSo1O7MW7T86cCJkPeP4RJ8JceAkR7On+61CwMkZytT9R1ArfN /EIIfI0L+fAIBndL9XjdI3x3j32fcU7DWwbc0Nmopd33mrYyuKOdjqR7npNU08i3EZryH/ hp3PWJXF95OYi0Jc0Aybnc8BqqcxanbktXek4/kwxD9l/cyZu5mMr4OLWLHNAKvt5LtM5/ ql41IrPZNeBy0Pxe9Rn9Ap7elasmhOIlap+84qqLhigNqi/YOpllXYgrlV5BK0A9+/bEXS 2QOjIVErvz7JCwZaegEZaHXZ4FOeZhdon4Vv1tRhQ3mRdjW7CwulkuEkmJ5/0Q== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1721775822; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Fpf7qdIjz8lMhXxk5kea1fEMR3E/dvvmfUr0OqjUvo8=; b=wFVpVTZLea9pzCEIvyNhG365IAT3zXQ0O4MT2b1wT+4zJgO13D0uNLL2Dd7p8bYGPkUQ9Q gTgcJ/tqYG+eFXA/QsgpGpsdjmtgVZlYdqfIpgYMU/FT0ZgAkBYrTGsMPSfnEk1T2Cn05S XAaM2M1HVYqoeVUaNawtFikYk14UVYP2PRejlfiKq8oyPI5rARaeQgZtIvfbZyfSQstVMl F9VNDrb3PboXaWcvr+BACj6ANgyiaBivPqPFjzfyhDpeo3lNJXKTvmDNqxsDolPQus0/Dm 7EuU1sY8nxnRnl1ZV5kVxrZwrL8x1+k0LcvI0OhVBakINUjyRBwauLnmPRwUfw== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4WTCPV1WBcznP8; Tue, 23 Jul 2024 23:03:42 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 46NN3gDr008362; Tue, 23 Jul 2024 23:03:42 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 46NN3gwY008359; Tue, 23 Jul 2024 23:03:42 GMT (envelope-from git) Date: Tue, 23 Jul 2024 23:03:42 GMT Message-Id: <202407232303.46NN3gwY008359@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Warner Losh Subject: git: bb7f7d5b5201 - main - nvme: Warn if there's system interrupt issues. List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: imp X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: bb7f7d5b5201cfe569fce79b0f325bec2cf38ad2 Auto-Submitted: auto-generated The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=bb7f7d5b5201cfe569fce79b0f325bec2cf38ad2 commit bb7f7d5b5201cfe569fce79b0f325bec2cf38ad2 Author: Warner Losh AuthorDate: 2024-07-23 23:02:33 +0000 Commit: Warner Losh CommitDate: 2024-07-23 23:04:03 +0000 nvme: Warn if there's system interrupt issues. Issue a warning if we have system interrupt issues. If you get this warning, then we submitted a request, it timed out without an interrupt being posted, but when we polled the card's completion, we found completion events. This indicates that we're missing interrupts, and to date all the times I've helped people track issues like this down it has been a system issue, not an NVMe driver isseue. Sponsored by: Netflix Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D46031 --- share/man/man4/nvme.4 | 9 +++++++++ sys/dev/nvme/nvme_private.h | 1 + sys/dev/nvme/nvme_qpair.c | 9 +++++++-- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/share/man/man4/nvme.4 b/share/man/man4/nvme.4 index 011ff483c839..dcd2ec86f5fa 100644 --- a/share/man/man4/nvme.4 +++ b/share/man/man4/nvme.4 @@ -239,6 +239,15 @@ detects that the AHCI device supports RST and when it is enabled. See .Xr ahci 4 for more details. +.Sh DIAGNOSTICS +.Bl -diag +.It "nvme%d: System interrupt issues?" +The driver found a timed-out transaction had a pending completion record, +indicating an interrupt had not been delivered. +The system is either not configuring interrupts properly, or the system drops +them under load. +This message will appear at most once per boot per controller. +.El .Sh SEE ALSO .Xr nda 4 , .Xr nvd 4 , diff --git a/sys/dev/nvme/nvme_private.h b/sys/dev/nvme/nvme_private.h index ff08f6581db5..05b5f3189eb2 100644 --- a/sys/dev/nvme/nvme_private.h +++ b/sys/dev/nvme/nvme_private.h @@ -303,6 +303,7 @@ struct nvme_controller { bool is_failed; bool is_dying; + bool isr_warned; STAILQ_HEAD(, nvme_request) fail_req; /* Host Memory Buffer */ diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c index c917b34dbe43..0c3a36d4d76f 100644 --- a/sys/dev/nvme/nvme_qpair.c +++ b/sys/dev/nvme/nvme_qpair.c @@ -1145,9 +1145,14 @@ do_reset: /* * There's a stale transaction at the start of the queue whose * deadline has passed. Poll the competions as a last-ditch - * effort in case an interrupt has been missed. + * effort in case an interrupt has been missed. Warn the user if + * transactions were found of possible interrupt issues, but + * just once per controller. */ - _nvme_qpair_process_completions(qpair); + if (_nvme_qpair_process_completions(qpair) && !ctrlr->isr_warned) { + nvme_printf(ctrlr, "System interrupt issues?\n"); + ctrlr->isr_warned = true; + } /* * Now that we've run the ISR, re-rheck to see if there's any