Re: 214df80a9cb3 - main - nvme: new define for size of host memory buffer sizes
Date: Sat, 09 Apr 2022 23:15:37 UTC
On Sat, Apr 9, 2022 at 4:50 PM Ravi Pokala <rpokala@freebsd.org> wrote: > -----Original Message----- > From: <owner-src-committers@freebsd.org> on behalf of Warner Losh > <imp@FreeBSD.org> > Date: 2022-04-08, Friday at 22:06 > To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>, > <dev-commits-src-main@FreeBSD.org> > Subject: git: 214df80a9cb3 - main - nvme: new define for size of host > memory buffer sizes > > The branch main has been updated by imp: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=214df80a9cb3e95a140b13af7d19deec2bbfae76 > > commit 214df80a9cb3e95a140b13af7d19deec2bbfae76 > Author: Warner Losh <imp@FreeBSD.org> > AuthorDate: 2022-04-09 05:01:06 +0000 > Commit: Warner Losh <imp@FreeBSD.org> > CommitDate: 2022-04-09 05:05:25 +0000 > > nvme: new define for size of host memory buffer sizes > > The nvme spec defines the various fields that specify sizes for > host > memory buffers in terms of 4096 chunks. So, rather than use a bare > 4096 > here, use NVME_HMB_UNITS. This is explicitly not the host page > size of > 4096, nor the default memory page size (mps) of the NVMe drive, > but its > own thing and needs its own define. > > Hi Warner, > > Are you sure about that? > NVMe-1.4, Figure 297: Host Memory Buffer – Host Memory Buffer Descriptor > Entry > > | Buffer Size (BSIZE): Indicates the number of contiguous > | memory page size (CC.MPS) units for this descriptor. > | > | Buffer Address (BADD): Indicates the host memory address for > | this descriptor aligned to the memory page size (CC.MPS). > | The lower bits (n:0) of this field indicate the offset > | within the memory page is 0h (e.g., if the memory page size > | is 4 KiB, then bits 11:00 shall be 0h; if the memory page > | size is 8 KiB, then bits 12:00 shall be 0h). > > They both reference mps, not 4096 bytes. > So, some I'm 100% sure about. There's one that was previously incorrectly hard wired to 4k. More below. From Table 275 of Rev 2.0: Host Memory Buffer Preferred Size (HMPRE): This field indicates the preferred size that the host is requested to allocate for the Host Memory Buffer feature in 4 KiB units. This value shall be greater than or equal to the Host Memory Buffer Minimum Size. If this field is non-zero, then the Host Memory Buffer feature is supported. If this field is cleared to 0h, then the Host Memory Buffer feature is not supported. Host Memory Buffer Minimum Size (HMMIN): This field indicates the minimum size that the host is requested to allocate for the Host Memory Buffer feature in 4 KiB units. If this field is cleared to 0h, then the host is requested to allocate any amount of host memory possible up to the HMPRE value. Host Memory Buffer Minimum Descriptor Entry Size (HMMINDS): This field indicates the minimum usable size of a Host Memory Buffer Descriptor Entry in 4 KiB units. If this field is cleared to 0h, then the controller does not indicate any limitations on the Host Memory Buffer Descriptor Entry size. These are the hmmin, hmminds and hmpre fields of 'cdata' in the driver. diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c > index 95a2b5c4285d..6996b3151b0d 100644 > --- a/sys/dev/nvme/nvme_ctrlr.c > +++ b/sys/dev/nvme/nvme_ctrlr.c > @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_controller > *ctrlr) > max = (uint64_t)physmem * PAGE_SIZE / 20; > TUNABLE_UINT64_FETCH("hw.nvme.hmb_max", &max); > max is a percent of available memory... > - min = (long long unsigned)ctrlr->cdata.hmmin * 4096; > + min = (long long unsigned)ctrlr->cdata.hmmin * NVME_HMB_UNITS; > if (max == 0 || max < min) > return; > - pref = MIN((long long unsigned)ctrlr->cdata.hmpre * 4096, max); > - minc = MAX(ctrlr->cdata.hmminds * 4096, PAGE_SIZE); > + pref = MIN((long long unsigned)ctrlr->cdata.hmpre * > NVME_HMB_UNITS, max); > + minc = MAX(ctrlr->cdata.hmminds * NVME_HMB_UNITS, PAGE_SIZE); > So for all of these, we're good. They are in 4kiB chunks. > if (min > 0 && ctrlr->cdata.hmmaxd > 0) > minc = MAX(minc, min / ctrlr->cdata.hmmaxd); > ctrlr->hmb_chunk = pref; > @@ -1023,7 +1023,7 @@ again: > for (i = 0; i < ctrlr->hmb_nchunks; i++) { > ctrlr->hmb_desc_vaddr[i].addr = > htole64(ctrlr->hmb_chunks[i].hmbc_paddr); > - ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk / > 4096); > + ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk / > NVME_HMB_UNITS); > This one, you are correct is wrong. I'll fix it when I bring in the changes to fully support MPS != 0. For MPS == 0, which are the only drives this driver supports correctly, this is a nop. It was wrong before, though. The text you quoted is correct about this. There are a couple of PAGE_SIZEs tinbetween these two chunks hat should be ctrlr->min_page_size instead (since that's the page size we're really using, not the host's). But PAGE_SIZE != 4k doesn't currently work with the nvme driver due to confusion like this. We also currently set the MPS field in the CC incorrectly when it isn't 0. Good catch. I'll update my pending changes. Warner > } > bus_dmamap_sync(ctrlr->hmb_desc_tag, ctrlr->hmb_desc_map, > BUS_DMASYNC_PREWRITE); > > >