Re: 214df80a9cb3 - main - nvme: new define for size of host memory buffer sizes

From: Warner Losh <imp_at_bsdimp.com>
Date: Sat, 09 Apr 2022 23:15:37 UTC
On Sat, Apr 9, 2022 at 4:50 PM Ravi Pokala <rpokala@freebsd.org> wrote:

> -----Original Message-----
> From: <owner-src-committers@freebsd.org> on behalf of Warner Losh
> <imp@FreeBSD.org>
> Date: 2022-04-08, Friday at 22:06
> To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>,
> <dev-commits-src-main@FreeBSD.org>
> Subject: git: 214df80a9cb3 - main - nvme: new define for size of host
> memory buffer sizes
>
>     The branch main has been updated by imp:
>
>     URL:
> https://cgit.FreeBSD.org/src/commit/?id=214df80a9cb3e95a140b13af7d19deec2bbfae76
>
>     commit 214df80a9cb3e95a140b13af7d19deec2bbfae76
>     Author:     Warner Losh <imp@FreeBSD.org>
>     AuthorDate: 2022-04-09 05:01:06 +0000
>     Commit:     Warner Losh <imp@FreeBSD.org>
>     CommitDate: 2022-04-09 05:05:25 +0000
>
>         nvme: new define for size of host memory buffer sizes
>
>         The nvme spec defines the various fields that specify sizes for
> host
>         memory buffers in terms of 4096 chunks. So, rather than use a bare
> 4096
>         here, use NVME_HMB_UNITS. This is explicitly not the host page
> size of
>         4096, nor the default memory page size (mps) of the NVMe drive,
> but its
>         own thing and needs its own define.
>
> Hi Warner,
>
> Are you sure about that?
>

NVMe-1.4, Figure 297: Host Memory Buffer – Host Memory Buffer Descriptor
> Entry
>
> | Buffer Size (BSIZE): Indicates the number of contiguous
> | memory page size (CC.MPS) units for this descriptor.
> |
> | Buffer Address (BADD): Indicates the host memory address for
> | this descriptor aligned to the memory page size (CC.MPS).
> | The lower bits (n:0) of this field indicate the offset
> | within the memory page is 0h (e.g., if the memory page size
> | is 4 KiB, then bits 11:00 shall be 0h; if the memory page
> | size is 8 KiB, then bits 12:00 shall be 0h).
>
> They both reference mps, not 4096 bytes.
>

So, some I'm 100% sure about. There's one that was previously incorrectly
hard wired to
4k. More below.

From Table 275 of Rev 2.0:

Host Memory Buffer Preferred Size (HMPRE): This field indicates
the preferred size that the host is requested to allocate for the
Host Memory Buffer feature in 4 KiB units. This value shall be
greater than or equal to the Host Memory Buffer Minimum Size.
If this field is non-zero, then the Host Memory Buffer feature is
supported. If this field is cleared to 0h, then the Host Memory
Buffer feature is not supported.

Host Memory Buffer Minimum Size (HMMIN): This field indicates
the minimum size that the host is requested to allocate for the
Host Memory Buffer feature in 4 KiB units. If this field is cleared
to 0h, then the host is requested to allocate any amount of host
memory possible up to the HMPRE value.

Host Memory Buffer Minimum Descriptor Entry Size (HMMINDS):
This field indicates the minimum usable size of a Host Memory
Buffer Descriptor Entry in 4 KiB units. If this field is cleared to 0h,
then the controller does not indicate any limitations on the Host
Memory Buffer Descriptor Entry size.

These are the hmmin, hmminds and hmpre fields of 'cdata' in the
driver.

    diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c
>     index 95a2b5c4285d..6996b3151b0d 100644
>     --- a/sys/dev/nvme/nvme_ctrlr.c
>     +++ b/sys/dev/nvme/nvme_ctrlr.c
>     @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_controller
> *ctrlr)
>         max = (uint64_t)physmem * PAGE_SIZE / 20;
>         TUNABLE_UINT64_FETCH("hw.nvme.hmb_max", &max);
>

max is a percent of available memory...


>     -   min = (long long unsigned)ctrlr->cdata.hmmin * 4096;
>     +   min = (long long unsigned)ctrlr->cdata.hmmin * NVME_HMB_UNITS;
>         if (max == 0 || max < min)
>                 return;
>     -   pref = MIN((long long unsigned)ctrlr->cdata.hmpre * 4096, max);
>     -   minc = MAX(ctrlr->cdata.hmminds * 4096, PAGE_SIZE);
>     +   pref = MIN((long long unsigned)ctrlr->cdata.hmpre *
> NVME_HMB_UNITS, max);
>     +   minc = MAX(ctrlr->cdata.hmminds * NVME_HMB_UNITS, PAGE_SIZE);
>

So for all of these, we're good. They are in 4kiB chunks.


>         if (min > 0 && ctrlr->cdata.hmmaxd > 0)
>                 minc = MAX(minc, min / ctrlr->cdata.hmmaxd);
>         ctrlr->hmb_chunk = pref;
>     @@ -1023,7 +1023,7 @@ again:
>         for (i = 0; i < ctrlr->hmb_nchunks; i++) {
>                 ctrlr->hmb_desc_vaddr[i].addr =
>                     htole64(ctrlr->hmb_chunks[i].hmbc_paddr);
>     -           ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk /
> 4096);
>     +           ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk /
> NVME_HMB_UNITS);
>

This one, you are correct is wrong. I'll fix it when I bring in the changes
to fully support
MPS != 0. For MPS == 0, which are the only drives this driver supports
correctly, this
is a nop. It was wrong before, though. The text you quoted is correct about
this. There
are a couple of PAGE_SIZEs tinbetween these two chunks hat should be
ctrlr->min_page_size instead (since that's the page size we're really
using, not the
host's). But PAGE_SIZE != 4k doesn't currently work with the nvme driver
due to
confusion like this.

We also currently set the MPS field in the CC incorrectly when it isn't 0.

Good catch. I'll update my pending changes.

Warner


>         }
>         bus_dmamap_sync(ctrlr->hmb_desc_tag, ctrlr->hmb_desc_map,
>             BUS_DMASYNC_PREWRITE);
>
>
>