Re: 214df80a9cb3 - main - nvme: new define for size of host memory buffer sizes
Date: Mon, 11 Apr 2022 17:13:52 UTC
A favor to ask.. this cutting of the "git: " string from the commit message subject when replying cuases me to miss classify mail messages. It only occurs very rarely, so I have to assume it is not an automatic thing. Can this behavior be curtailed please? Thanks, Rod [ Charset UTF-8 unsupported, converting... ] > On Sat, Apr 9, 2022 at 5:15 PM Warner Losh <imp@bsdimp.com> wrote: > > > > > > > On Sat, Apr 9, 2022 at 4:50 PM Ravi Pokala <rpokala@freebsd.org> wrote: > > > >> -----Original Message----- > >> From: <owner-src-committers@freebsd.org> on behalf of Warner Losh > >> <imp@FreeBSD.org> > >> Date: 2022-04-08, Friday at 22:06 > >> To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>, > >> <dev-commits-src-main@FreeBSD.org> > >> Subject: git: 214df80a9cb3 - main - nvme: new define for size of host > >> memory buffer sizes > >> > >> The branch main has been updated by imp: > >> > >> URL: > >> https://cgit.FreeBSD.org/src/commit/?id=214df80a9cb3e95a140b13af7d19deec2bbfae76 > >> > >> commit 214df80a9cb3e95a140b13af7d19deec2bbfae76 > >> Author: Warner Losh <imp@FreeBSD.org> > >> AuthorDate: 2022-04-09 05:01:06 +0000 > >> Commit: Warner Losh <imp@FreeBSD.org> > >> CommitDate: 2022-04-09 05:05:25 +0000 > >> > >> nvme: new define for size of host memory buffer sizes > >> > >> The nvme spec defines the various fields that specify sizes for > >> host > >> memory buffers in terms of 4096 chunks. So, rather than use a > >> bare 4096 > >> here, use NVME_HMB_UNITS. This is explicitly not the host page > >> size of > >> 4096, nor the default memory page size (mps) of the NVMe drive, > >> but its > >> own thing and needs its own define. > >> > >> Hi Warner, > >> > >> Are you sure about that? > >> > > > > NVMe-1.4, Figure 297: Host Memory Buffer ? Host Memory Buffer Descriptor > >> Entry > >> > >> | Buffer Size (BSIZE): Indicates the number of contiguous > >> | memory page size (CC.MPS) units for this descriptor. > >> | > >> | Buffer Address (BADD): Indicates the host memory address for > >> | this descriptor aligned to the memory page size (CC.MPS). > >> | The lower bits (n:0) of this field indicate the offset > >> | within the memory page is 0h (e.g., if the memory page size > >> | is 4 KiB, then bits 11:00 shall be 0h; if the memory page > >> | size is 8 KiB, then bits 12:00 shall be 0h). > >> > >> They both reference mps, not 4096 bytes. > >> > > > > So, some I'm 100% sure about. There's one that was previously incorrectly > > hard wired to > > 4k. More below. > > > > From Table 275 of Rev 2.0: > > > > Host Memory Buffer Preferred Size (HMPRE): This field indicates > > the preferred size that the host is requested to allocate for the > > Host Memory Buffer feature in 4 KiB units. This value shall be > > greater than or equal to the Host Memory Buffer Minimum Size. > > If this field is non-zero, then the Host Memory Buffer feature is > > supported. If this field is cleared to 0h, then the Host Memory > > Buffer feature is not supported. > > > > Host Memory Buffer Minimum Size (HMMIN): This field indicates > > the minimum size that the host is requested to allocate for the > > Host Memory Buffer feature in 4 KiB units. If this field is cleared > > to 0h, then the host is requested to allocate any amount of host > > memory possible up to the HMPRE value. > > > > Host Memory Buffer Minimum Descriptor Entry Size (HMMINDS): > > This field indicates the minimum usable size of a Host Memory > > Buffer Descriptor Entry in 4 KiB units. If this field is cleared to 0h, > > then the controller does not indicate any limitations on the Host > > Memory Buffer Descriptor Entry size. > > > > These are the hmmin, hmminds and hmpre fields of 'cdata' in the > > driver. > > > > diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c > >> index 95a2b5c4285d..6996b3151b0d 100644 > >> --- a/sys/dev/nvme/nvme_ctrlr.c > >> +++ b/sys/dev/nvme/nvme_ctrlr.c > >> @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_controller > >> *ctrlr) > >> max = (uint64_t)physmem * PAGE_SIZE / 20; > >> TUNABLE_UINT64_FETCH("hw.nvme.hmb_max", &max); > >> > > > > max is a percent of available memory... > > > > > >> - min = (long long unsigned)ctrlr->cdata.hmmin * 4096; > >> + min = (long long unsigned)ctrlr->cdata.hmmin * NVME_HMB_UNITS; > >> if (max == 0 || max < min) > >> return; > >> - pref = MIN((long long unsigned)ctrlr->cdata.hmpre * 4096, max); > >> - minc = MAX(ctrlr->cdata.hmminds * 4096, PAGE_SIZE); > >> + pref = MIN((long long unsigned)ctrlr->cdata.hmpre * > >> NVME_HMB_UNITS, max); > >> + minc = MAX(ctrlr->cdata.hmminds * NVME_HMB_UNITS, PAGE_SIZE); > >> > > > > So for all of these, we're good. They are in 4kiB chunks. > > > > > >> if (min > 0 && ctrlr->cdata.hmmaxd > 0) > >> minc = MAX(minc, min / ctrlr->cdata.hmmaxd); > >> ctrlr->hmb_chunk = pref; > >> @@ -1023,7 +1023,7 @@ again: > >> for (i = 0; i < ctrlr->hmb_nchunks; i++) { > >> ctrlr->hmb_desc_vaddr[i].addr = > >> htole64(ctrlr->hmb_chunks[i].hmbc_paddr); > >> - ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk > >> / 4096); > >> + ctrlr->hmb_desc_vaddr[i].size = htole32(ctrlr->hmb_chunk > >> / NVME_HMB_UNITS); > >> > > > > This one, you are correct is wrong. I'll fix it when I bring in the > > changes to fully support > > MPS != 0. For MPS == 0, which are the only drives this driver supports > > correctly, this > > is a nop. It was wrong before, though. The text you quoted is correct > > about this. There > > are a couple of PAGE_SIZEs tinbetween these two chunks hat should be > > ctrlr->min_page_size instead (since that's the page size we're really > > using, not the > > host's). But PAGE_SIZE != 4k doesn't currently work with the nvme driver > > due to > > confusion like this. > > > > https://reviews.freebsd.org/D34871 should address that. > > https://reviews.freebsd.org/D34865 through > https://reviews.freebsd.org/D34873 are my current > series. With them applied, I'm able to boot with 16k pages on a ZFS-based > nvme arm64 system. > I believe it would also allow us to use different drive page sizes too, but > I haven't tried to figure > that out :)... > > Warner > > > > We also currently set the MPS field in the CC incorrectly when it isn't 0. > > > > Good catch. I'll update my pending changes. > > > > Warner > > > > > >> } > >> bus_dmamap_sync(ctrlr->hmb_desc_tag, ctrlr->hmb_desc_map, > >> BUS_DMASYNC_PREWRITE); > >> > >> > >> -- Rod Grimes rgrimes@freebsd.org