From nobody Wed May 25 15:29:26 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 8E1FB1B40ECE for ; Wed, 25 May 2022 15:29:39 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-vs1-xe36.google.com (mail-vs1-xe36.google.com [IPv6:2607:f8b0:4864:20::e36]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L7ZkB34m1z4WvZ for ; Wed, 25 May 2022 15:29:38 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-vs1-xe36.google.com with SMTP id w10so19261363vsa.4 for ; Wed, 25 May 2022 08:29:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fzm1qKK3/3Z4nRwGmbK8hKNJY5SvrLnpD6aiuUdcpkQ=; b=oVGfiO3h0wPg67MV5Gz+sGfiCba/ukyfjKePLkeF1UhW+pEBbbS77CO4PoW98exzSw jroj7sjJSU49JNi/LN2CmlpPRCOYS2jwMq/tS7kAdCILqVrvoDLqwTlltyyy7tE1rZp6 eDJS260vbybqs2wVih3FEj9S8kssclq8dXptFtChdQCoMYTb4NuvVyVeTDm/3Ki6XBmL akAMt9P357LY5Gk5YHSc2K+oT7GG2r34Mqv/1OeN2T6faDOXVSdfeYJGHd3Jc+HOX+MP 9M9NDbe1vWpFTDPpPsBha6Rjx/FxQpr0fB4DTyxggp1HyfvFftsRzoif01U8Qw+5iqIU VRbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fzm1qKK3/3Z4nRwGmbK8hKNJY5SvrLnpD6aiuUdcpkQ=; b=G3of7WPDtcTJ5zVkJqtEZeO3jUrkDT5TF1rg0jBoa61EK6AN8Dicy/IHHEJ9KxVvzY /+O+BGKMeP4o/mbZ7+QB8WbEXwSRwOfGJXGPKFiZtoF1N1p07PVSrtzFEVTFUDBMjULN s59pVr29uJZ021D/+VJD4MsgMZPH7vtoo5X44yyCXmVk010kbReZUiND0bMMnipK9yOf thPa5k9LOq3myH4MK+iMm6kIIrXSz9efX/M+/rr35XGPlrG+fCWFJx1BzEWHjWBIUL+m vzZCfiwrbfAAU9MJAHhb7h4urMQxENa1QNIMycVv35lpEjLkQJzxvvtLMNIr9EPdRUel 7s3A== X-Gm-Message-State: AOAM5334Rv/ZANNAI1YT2KsqntxSlgH0TTgG0QmzMhuU5LKMeP1Qm85p KRTuuCMSqRKMzK+79jr69FpJbNmR0GBSHm37ldRE1FZcLeK5Jg== X-Google-Smtp-Source: ABdhPJyeE06glXKJ4e+drsmSpaJMzJrVPgcy8nqjuPagAKMnALe+0nA7pnqgO1abMqhIqNzwfbvFWHQwGVAj1Kum5Ec= X-Received: by 2002:a67:f8ce:0:b0:335:d520:ab7f with SMTP id c14-20020a67f8ce000000b00335d520ab7fmr13137942vsp.51.1653492577745; Wed, 25 May 2022 08:29:37 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <20220525122529.t2kwfg2q65dfiyyt@host-ubertino-mac-88e9fe7361f5.eduroam.ssid.10net.amherst.edu> <20220526001715.4ffee96a@ws1.wobblyboot.net> In-Reply-To: <20220526001715.4ffee96a@ws1.wobblyboot.net> From: Warner Losh Date: Wed, 25 May 2022 09:29:26 -0600 Message-ID: Subject: Re: nvme INVALID_FIELD in dmesg.boot To: matti k Cc: Alexander Motin , Matteo Riondato , FreeBSD Current , Jim Harris Content-Type: multipart/alternative; boundary="000000000000bc532305dfd7bd2c" X-Rspamd-Queue-Id: 4L7ZkB34m1z4WvZ X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=oVGfiO3h; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::e36) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-2.96 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; NEURAL_HAM_SHORT(-0.96)[-0.956]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::e36:from]; MLMMJ_DEST(0.00)[freebsd-current]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N --000000000000bc532305dfd7bd2c Content-Type: text/plain; charset="UTF-8" On Wed, May 25, 2022 at 8:18 AM matti k wrote: > On Wed, 25 May 2022 09:58:54 -0400 > Alexander Motin wrote: > > > On 25.05.2022 08:25, Matteo Riondato wrote: > > > My dmesg.boot contains the following entries containing > > > "INVALID_FIELD" about nvme (I use nda(4) for my nvme disks, with > > > hw.nvme.use_nvd=0 in loader.conf): > > > > > > trismegistus ~ % grep -e 'nvme[0-9]\?' /var/run/dmesg.boot > > > nvme0: mem 0xb8610000-0xb8613fff irq 40 at device > > > 0.0 numa-domain 0 on pci7 > > > nvme1: mem 0xb8510000-0xb8513fff irq 47 at device > > > 0.0 numa-domain 0 on pci8 > > > nvme2: mem 0xc5e10000-0xc5e13fff irq 48 at device > > > 0.0 numa-domain 0 on pci10 > > > nvme3: mem 0xc5d10000-0xc5d13fff irq 55 at device > > > 0.0 numa-domain 0 on pci11 > > > nvme0: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b > > > cdw11:0000031f nvme0: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0 > > > nvme1: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b > > > cdw11:0000031f nvme1: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0 > > > nvme2: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b > > > cdw11:0000031f nvme2: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0 > > > nvme3: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b > > > cdw11:0000031f nvme3: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0 > > > nda0 at nvme0 bus 0 scbus16 target 0 lun 1 > > > nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link > > > nda1 at nvme1 bus 0 scbus17 target 0 lun 1 > > > nda1: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link > > > nda2 at nvme2 bus 0 scbus18 target 0 lun 1 > > > nda2: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link > > > nda3 at nvme3 bus 0 scbus19 target 0 lun 1 > > > nda3: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link > > > > > > The disks seem to work fine, from what I can tell. > > > > > > Are the "INVALID_FIELD" messages harmless, or can they be avoided > > > with some tuning, or maybe with some patch? > > > > Those messages mean that driver tried to enable certain types of > > asynchronous events, but probably the hardware does not support some > > of those. If you wish to experiment we could try to mask some of the > > bits in nvme_ctrlr_configure_aer() function to find out which one > > exactly, but for discontinued drives 4-5 years old it might not have > > too much sense. It should not be critical unless you either overheat > > them, or somehow else they fail and wish to report it. > > > > I am intrigued to how you guru's know this, is it because you know > the code well enough? > SET FEATURES (opcode 9) feature 0xb is indeed async event configuration. 0x31f is: SMART WARNING for available spares (0x1) SMART warning for temperature (0x2) SMART WARNING for device reliability (0x4) SMART WARNING for being read only (0x8) SMART WARNING for volatile memory backup (0x10) Namespace attribute change events (0x100) Firmware activation events (0x200) I wonder which one of those it doesn't like. My reading of the standard suggests that those should always be supported for a 1.2 and later drive... Thought maybe with the possible exception of the volatile memory backup, so let me do some digging here... We can get the last two items from OAES field of the controller identificaiton data. This is bytes 95:92, which if I'm counting right is the last word on the 040: line in the nvmecontrol identify -x nvmeX command: 040: 4e474e4b 30303150 000cca07 00230000 00010200 005b8d80 0030d400 00000100 ----------------------------------------------------------------------------------------------------------^^^^^^^^^ It looks like we don't currently test these bits before we add the last two (we do it unconditionally for >= 1.2, and maybe we should check these bits >= 1.2). Would you be able to test a fix for this? Warner --000000000000bc532305dfd7bd2c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, May 25, 2022 at 8:18 AM matti= k <mattik@gwsit.com.au> w= rote:
On Wed, 25= May 2022 09:58:54 -0400
Alexander Motin <mav@FreeBSD.org> wrote:

> On 25.05.2022 08:25, Matteo Riondato wrote:
> > My dmesg.boot contains the following entries containing
> > "INVALID_FIELD" about nvme (I use nda(4) for my nvme di= sks, with
> > hw.nvme.use_nvd=3D0 in loader.conf):
> >
> > trismegistus ~ % grep -e 'nvme[0-9]\?' /var/run/dmesg.boo= t
> > nvme0: <Intel DC PC4500> mem 0xb8610000-0xb8613fff irq 40 a= t device
> > 0.0 numa-domain 0 on pci7
> > nvme1: <Intel DC PC4500> mem 0xb8510000-0xb8513fff irq 47 a= t device
> > 0.0 numa-domain 0 on pci8
> > nvme2: <Intel DC PC4500> mem 0xc5e10000-0xc5e13fff irq 48 a= t device
> > 0.0 numa-domain 0 on pci10
> > nvme3: <Intel DC PC4500> mem 0xc5d10000-0xc5d13fff irq 55 a= t device
> > 0.0 numa-domain 0 on pci11
> > nvme0: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme0: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<= br> > > nvme1: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme1: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<= br> > > nvme2: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme2: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<= br> > > nvme3: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme3: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<= br> > > nda0 at nvme0 bus 0 scbus16 target 0 lun 1
> > nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) lin= k
> > nda1 at nvme1 bus 0 scbus17 target 0 lun 1
> > nda1: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) lin= k
> > nda2 at nvme2 bus 0 scbus18 target 0 lun 1
> > nda2: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) lin= k
> > nda3 at nvme3 bus 0 scbus19 target 0 lun 1
> > nda3: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) lin= k
> >
> > The disks seem to work fine, from what I can tell.
> >
> > Are the "INVALID_FIELD" messages harmless, or can they = be avoided
> > with some tuning, or maybe with some patch?
>
> Those messages mean that driver tried to enable certain types of
> asynchronous events, but probably the hardware does not support some > of those.=C2=A0 If you wish to experiment we could try to mask some of= the
> bits in nvme_ctrlr_configure_aer() function to find out which one
> exactly, but for discontinued drives 4-5 years old it might not have > too much sense.=C2=A0 It should not be critical unless you either over= heat
> them, or somehow else they fail and wish to report it.
>

I am intrigued to how you guru's know this, is it=C2=A0 because you kno= w
the code well enough?

SET FEATURES (opc= ode 9) feature 0xb is indeed async event configuration.
0x31f is:=
SMART WARNING for available spares (0x1)
SMART war= ning for temperature (0x2)
SMART WARNING for device reliability (= 0x4)
SMART WARNING for being read only (0x8)
SMART WARN= ING for volatile memory backup (0x10)
Namespace attribute change = events (0x100)
Firmware activation events (0x200)

<= /div>
I wonder which one of those it doesn't like. My reading of th= e standard suggests that those
should always be supported for a 1= .2 and later drive... Thought maybe with the possible
exception o= f the volatile memory backup, so let me do some digging here...
<= br>
We can get the last two items from OAES field of the controll= er identificaiton data. This is bytes 95:92,
which if I'm cou= nting right is the last word on the 040: line in the nvmecontrol identify -= x nvmeX command:

040: 4e474e4b 30303150 000cca07 0= 0230000 00010200 005b8d80 0030d400 00000100
-----------------= ---------------------------------------------------------------------------= --------------^^^^^^^^^

It looks like we don't= currently test these bits before we add the last two (we do it uncondition= ally
for >=3D 1.2, and maybe we should check these bits >= =3D 1.2).

Would you be able to test a fix for this= ?

Warner
--000000000000bc532305dfd7bd2c--