From nobody Wed Jan 22 03:04:23 2025 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Yd87C1hZsz5kkkT for ; Wed, 22 Jan 2025 03:04:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Yd87C0z67z3p42 for ; Wed, 22 Jan 2025 03:04:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1737515063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0NDr1wbeXAmyIGOaGy8Cryta1kg2pQrRHBrOJA/3xgc=; b=KgujpfLp1L4Dfdh49n50ASXBBoelSLi4kDgWbZ9iBBSD6xVSc0mKJur0A7uUvGtjWdY6cu SF1A7euJTwCYJsHr69miQgfS70sNwTqhAJhWR3mV0D+acO1cYBM1tW0cbYwZ6V/RpL5zRp ZGGhHYwjW9sHNkApQNR4bhtZQ1X76H+LYap7rpz/G1iudHE/NedBsZ5tgN0uJwvbcXRfug vUwooE2MI8qhYvaLAK6T5RRDXuYexjjrHrucjTJArbnM8WV0dOLgjrPG+xquDEpDB4KL9g ZsmvyMNLrQzVWDh3V6oBJZgAg20PH79pv98pMNQaIu8l0oJTa+mg8RO4c/CKCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1737515063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0NDr1wbeXAmyIGOaGy8Cryta1kg2pQrRHBrOJA/3xgc=; b=SDXtCe6Z4PSq24ZvwlNt3FEqVUF1ZCZSGl/kYWy2VR8SRJ7lOSb7efedB+TpFCSE28Rv2r aAs5jjvVNT74ZKmZCylKEAySdvhFt3YueBpCqAEjxDcBYAz2cIzCUip6t/Vgp+7AW4Co4Y ImzfTr9txxAOLQ28H+4Wm3M45m31VX3fLIWIhTU3Uroay672jOQwfpBULkn8J1m2dDxwhX ewrvXjznQKlbNvizFS3LyHPCJiOCWNSlAhOE4Fh1L82WopIGqczPV8qB37/a9/33dsYxGC Vhd5lx0hauwIxBsf/83Qu+Y4wwye1O8pPLR7PQQW/3pFymDr5y4jUTc3OdaelQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1737515063; a=rsa-sha256; cv=none; b=OHDydxHc7ctkGgptt05jnw5fg3mqS6KurOp5/10LgA0uGk1MoFRanUdAGcPnhp5Q8FkXrf QGJkRMcuI3K3+MnivMv6KULqBKsrChG5xoBScX/gnADr1M1BAG8w6lLNbXku8m5mMOL/mE I/Ms8XmLVjm2zIINAqYp452yjiQenQKfDloKwTgEahvBssZnro2cGZzbJoiEayZ55se3aj uexXckLNluXsUSEm2ZMqK8TVWQ0MrPXhl2+fMKvxa5H5SY6NLgP7pCVwq8ss3gqoSgMGlX hYtFSy8jK2j8kyliMP8tE6dR2QEy7kroT1GB4UEmypjMV3DvBE4j+m5yV8yGnA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Yd87C0LLYzbch for ; Wed, 22 Jan 2025 03:04:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 50M34Mnm019746 for ; Wed, 22 Jan 2025 03:04:22 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 50M34Mlh019745 for bugs@FreeBSD.org; Wed, 22 Jan 2025 03:04:22 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 283189] Sporadic NVMe DMAR faults since updating to 14.2-STABLE Date: Wed, 22 Jan 2025 03:04:23 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 14.2-STABLE X-Bugzilla-Keywords: regression X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: jah@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D283189 --- Comment #1 from Jason A. Harmening --- Reverting from nda(4) to nvd(4) didn't resolve this issue, if anything it appeared to make it slightly worse. I did enable NVMe verbose command logg= ing, which yielded error logs like the following: nvme0: WRITE sqid:14 cid:120 nsid:1 lba:475732976 len:96 DMAR4: nvme0: pci7:0:0 sid 700 fault acc 1 adt 0x0 reason 0x6 addr a5243000 nvme0: nsid:0x1 rsvd2:0 rsvd3:0 mptr:0 prp1:0xa5243000 prp2:0xcf800 nvme0: cdw10: 0x1c5b1bf0 cdw11:0 cdw12:0x5f cdw13:0 cdw14:0 cdw15:0 nvme0: DATA TRANSFER ERROR (00/04) crd:0 m:1 dnr:1 p:0 sqid:14 cid:120 cdw0= :0 The errors continue to always be for NVMe writes (i.e. a DMA read access by= the controller). I've also still never seen these faults for any device besides nvme, and all still show the same DMAR fault code and similar small transfer sizes. Interestingly, all of the errors I've seen so far (about 15 of them since enabling verbose logging) show the DMAR fault being taken against the buffe= r in PRP1, even in cases in which PRP2 is populated. So it seems the NVMe access that triggers the fault is always at the beginning of the region mapped by = the NVMe command. This "smells" like the sort of issue I'm used to seeing at $work on weakly-ordered arm64 devices when there is a missing barrier between a page table modification and a memory access that has an implicit dependency on t= he page table modification. In this case the page table modification would be= the DMAR PTE write that maps the PRP1 buffer, while the memory access would be = the NVMe controller read triggered by appending the write command to the submis= sion queue. But I would be surprised if that kind of issue is at play here given the stronger ordering of the x86 memory model. --=20 You are receiving this mail because: You are the assignee for the bug.=