Re: nvme INVALID_FIELD in dmesg.boot

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 25 May 2022 15:49:56 UTC
On Wed, May 25, 2022 at 9:39 AM Matteo Riondato <matteo@freebsd.org> wrote:

> On 2022-05-25 at 11:29 EDT, Warner Losh <imp@bsdimp.com> wrote:
> >
> >SET FEATURES (opcode 9) feature 0xb is indeed async event
> >configuration.
> >0x31f is:
> >SMART WARNING for available spares (0x1)
> >SMART warning for temperature (0x2)
> >SMART WARNING for device reliability (0x4)
> >SMART WARNING for being read only (0x8)
> >SMART WARNING for volatile memory backup (0x10)
> >Namespace attribute change events (0x100)
> >Firmware activation events (0x200)
> >
> >I wonder which one of those it doesn't like. My reading of the standard
> >suggests that those should always be supported for a 1.2 and later
> >drive... Thought maybe with the possible exception of the volatile
> >memory backup, so let me do some digging here...
> >
> >We can get the last two items from OAES field of the controller
> >identificaiton data. This is bytes 95:92, which if I'm counting right
> >is the last word on the 040: line in the nvmecontrol identify -x nvmeX
> >command:
> >
> >040: 4e474e4b 30303150 000cca07 00230000 00010200 005b8d80 0030d400
> >00000100
>
> >----------------------------------------------------------------------------------------------------------^^^^^^^^^
>
> On my system:
>
> 040: 31564456 30373130 5cd2e400 00000500 00010200 001e8480 002dc6c0
> 00000200
>

Yea, 0x200 and we send 0x300, so maybe that's the cause of the message....


> (same for all nvmeX, as far as I can tell)
>
> >It looks like we don't currently test these bits before we add the last
> >two (we do it unconditionally for >= 1.2, and maybe we should check
> >these bits >= 1.2).
> >
> >Would you be able to test a fix for this?
>
> Yes, I would be happy to, but I cannot do it for a couple of weeks
> (running simulations for a deadline).
>

There's  no real rush... Your system will be fine without these events
given what
I think you are doing with it. You might want to check the smart log page
to see
if any of the drives have indicators of trouble... but most trouble you'd
care about
would likely torpedo your simulation very very shortly after they happen so
even
that likely isn't strictly required.

Warner