Re: FreeBSD hugepages
- In reply to: Jake Freeland : "Re: FreeBSD hugepages"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 25 Jul 2024 23:20:42 UTC
On 7/25/24 18:11, Jake Freeland wrote: > On 7/25/24 17:40, Mark Johnston wrote: >> On Thu, Jul 25, 2024 at 06:34:43PM -0400, Mark Johnston wrote: >>> On Thu, Jul 25, 2024 at 04:11:22PM -0500, Jake Freeland wrote: >>>> On 7/25/24 15:18, Mark Johnston wrote: >>>>> On Thu, Jul 25, 2024 at 02:47:16PM -0500, Jake Freeland wrote: >>>>>> On 7/25/24 14:02, Konstantin Belousov wrote: >>>>>>> On Thu, Jul 25, 2024 at 01:46:17PM -0500, Jake Freeland wrote: >>>>>>>> Hi there, >>>>>>>> >>>>>>>> I have been steadily working on bringing Data Plane Development >>>>>>>> Kit (DPDK) >>>>>>>> on FreeBSD up to date with the Linux version. The most >>>>>>>> significant hurdle so >>>>>>>> far has been supporting concurrent DPDK processes, each with >>>>>>>> their own >>>>>>>> contiguous memory regions. >>>>>>>> >>>>>>>> These contiguous regions are used by DPDK as a heap for >>>>>>>> allocating DMA >>>>>>>> buffers and other miscellaneous resources. Retrieving the >>>>>>>> underlying memory >>>>>>>> and mapping these regions is currently different on Linux and >>>>>>>> FreeBSD: >>>>>>>> >>>>>>>> On Linux, hugepages are fetched from the kernel's pre-allocated >>>>>>>> hugepage >>>>>>>> pool and are mapped into virtual address space on DPDK >>>>>>>> initialization. Since >>>>>>>> the hugepages exist in a pool, multiple processes can reserve >>>>>>>> their own >>>>>>>> hugepages and operate concurrently. >>>>>>>> >>>>>>>> On FreeBSD, DPDK uses an in-house contigmem kernel module that >>>>>>>> reserves a >>>>>>>> large contiguous region of memory on load. During DPDK >>>>>>>> initialization, the >>>>>>>> entire region is mapped into virtual address space. This leaves >>>>>>>> no memory >>>>>>>> for another independent DPDK process, so only one process can >>>>>>>> operate at a >>>>>>>> time. >>>>>>>> >>>>>>>> I could modify the DPDK contigmem module to mimic Linux's >>>>>>>> hugepages, but I >>>>>>>> thought it would be better to integrate and upstream a >>>>>>>> hugepage-like >>>>>>>> interface directly in the FreeBSD kernel source. I am writing >>>>>>>> this email to >>>>>>>> see if anyone has any advice on the matter. I did not see any >>>>>>>> previous >>>>>>>> attempts at this in Phabriactor or the commit log, but it is >>>>>>>> possible that I >>>>>>>> missed it. I have read about transparent superpage promotion, >>>>>>>> but that seems >>>>>>>> like a different mechanism altogether. >>>>>>>> >>>>>>>> At a quick glance, the implementation seems straightforward: >>>>>>>> read some >>>>>>>> loader tunables, allocate persistent hugepages at boot time, >>>>>>>> and create a >>>>>>>> pseudo filesystem that supports creating and mapping hugepages. >>>>>>>> I could be >>>>>>>> underestimating the magnitude of this task, but that is why I'm >>>>>>>> asking for >>>>>>>> thoughts and advice :) >>>>>>>> >>>>>>>> For reference, here is Linux's documentation on hugepages: >>>>>>>> https://docs.kernel.org/admin-guide/mm/hugetlbpage.html >>>>>>> Are posix shm largepages objects enough (they were developed to >>>>>>> support >>>>>>> DPDK). Look for shm_create_largepage(3). >>>>>> Yes, shm_create_largepage(2) looks promising, but I would like >>>>>> the ability >>>>>> to allocate these largepages at boot time when memory >>>>>> fragmentation as at a >>>>>> minimum. Perhaps a couple sysctl tunables could be added onto the >>>>>> vm.largepages node to specify a pagesize and allocate some number >>>>>> of pages >>>>>> at boot? >>>>> We could add an rc script which creates named largepage objects. >>>>> This >>>>> can be done using the posixshmcontrol utility. That might not be >>>>> early >>>>> enough during boot for some purposes. In that case, we could have a >>>>> module which creates such objects from within the kernel. This is >>>>> pretty straightforward to do; I wrote a dumb version of this for a >>>>> mips-specific project a few years ago, feel free to take code or >>>>> inspiration from it: https://people.freebsd.org/~markj/tlbdemo.c >>>> Looks simple enough. Thanks for the example code. >>>> >>>>>> It seems Linux had an interface similar to >>>>>> shm_create_largepage(2) back in >>>>>> v2.5, but they removed it in favor of their hugetlbfs filesystem. >>>>>> It would >>>>>> be nice to stay close to the file-backed Linux interface to >>>>>> maximize code >>>>>> sharing in userspace. It looks like the foundation for hugepages >>>>>> is there, >>>>>> but the interface for allocation and access needs to be extended. >>>>> POSIX shm objects have most of the properties one would want, I'd >>>>> expect, save the ability to access them via standard syscalls. What >>>>> else is missing besides the ability to reserve memory at boot time? >>>> Most notably, I would like the ability to allocate pages in a >>>> specific NUMA >>>> domain. >>> I thought this was already supported, but it seems not... >> Thinking a bit more, I'm pretty sure I had just been using something >> like >> >> $ cpuset -n prefer:<domain> posixshmcontrol create -l 1G >> /largepage-1G-<domain> >> >> so didn't need an explicit NUMA configuration parameter. In C one would >> use cpuset_setdomain(2) instead, but that's not as convenient. So, >> imbuing a NUMA domain in struct shm_largepage_conf is still probably a >> reasonable thing to do. > > I just looked at the code, this seems very manageable. I'll draft up a > review. > >>> It should be very easy to implement: extend shm_largepage_conf to >>> include a NUMA domain parameter, and specify that domain when >>> allocating >>> pages for the object (in shm_largepage_dotruncate(), the >>> vm_page_alloc_contig() call should become a >>> vm_page_alloc_contig_domain() call). >>> >>>> Otherwise, in a perfect world, I'd like a unified interface for both >>>> Linux and FreeBSD. Linux hugepages are managed using standard >>>> system calls; >>>> files are mmap(2)'d into virtual address space from hugetlbfs and >>>> ftruncate(2)'d. >>> largepage shm objects work this way as well. > > After reading through the man page, this is quite apparent. Not sure > how I failed make that connection. Anyway, this is starting to look > easier than I thought it would be. The only difference from a > userspace perspective that I can think of right now is how the pages > are created (e.g. hugetlbfs open(2) on Linux vs. > shm_create_largepage(2) on FreeBSD). I suppose I should clarify that hugetlbfs open(2) does not create a hugepage, but rather attaches to one. So it would be analogous to a shm_open(2) instead of shm_create_largepage(2). The hugepages are created at boottime or via sysfs on Linux. My mistake. Jake Freeland > > Thanks for the guidance Mark and Konstantin. > > Jake Freeland >>>> A matching interface would not add an extra kernel >>>> entrypoint and even more importantly, it would ease the >>>> Linux-to-FreeBSD >>>> porting process for programs that use hugepages. >