Re: nvme INVALID_FIELD in dmesg.boot

From: matti k <mattik_at_gwsit.com.au>
Date: Wed, 25 May 2022 14:17:15 UTC
On Wed, 25 May 2022 09:58:54 -0400
Alexander Motin <mav@FreeBSD.org> wrote:

> On 25.05.2022 08:25, Matteo Riondato wrote:
> > My dmesg.boot contains the following entries containing
> > "INVALID_FIELD" about nvme (I use nda(4) for my nvme disks, with
> > hw.nvme.use_nvd=0 in loader.conf):
> > 
> > trismegistus ~ % grep -e 'nvme[0-9]\?' /var/run/dmesg.boot
> > nvme0: <Intel DC PC4500> mem 0xb8610000-0xb8613fff irq 40 at device
> > 0.0 numa-domain 0 on pci7
> > nvme1: <Intel DC PC4500> mem 0xb8510000-0xb8513fff irq 47 at device
> > 0.0 numa-domain 0 on pci8
> > nvme2: <Intel DC PC4500> mem 0xc5e10000-0xc5e13fff irq 48 at device
> > 0.0 numa-domain 0 on pci10
> > nvme3: <Intel DC PC4500> mem 0xc5d10000-0xc5d13fff irq 55 at device
> > 0.0 numa-domain 0 on pci11
> > nvme0: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme0: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
> > nvme1: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme1: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
> > nvme2: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme2: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
> > nvme3: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
> > cdw11:0000031f nvme3: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
> > nda0 at nvme0 bus 0 scbus16 target 0 lun 1
> > nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
> > nda1 at nvme1 bus 0 scbus17 target 0 lun 1
> > nda1: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
> > nda2 at nvme2 bus 0 scbus18 target 0 lun 1
> > nda2: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
> > nda3 at nvme3 bus 0 scbus19 target 0 lun 1
> > nda3: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
> > 
> > The disks seem to work fine, from what I can tell.
> > 
> > Are the "INVALID_FIELD" messages harmless, or can they be avoided
> > with some tuning, or maybe with some patch?
> 
> Those messages mean that driver tried to enable certain types of 
> asynchronous events, but probably the hardware does not support some
> of those.  If you wish to experiment we could try to mask some of the
> bits in nvme_ctrlr_configure_aer() function to find out which one
> exactly, but for discontinued drives 4-5 years old it might not have
> too much sense.  It should not be critical unless you either overheat
> them, or somehow else they fail and wish to report it.
> 

I am intrigued to how you guru's know this, is it  because you know
the code well enough?