From nobody Thu Jul 25 20:18:16 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WVMdn6CNJz5RDhp for ; Thu, 25 Jul 2024 20:18:21 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WVMdn32vrz46dN for ; Thu, 25 Jul 2024 20:18:21 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-44fee8813c3so4303491cf.2 for ; Thu, 25 Jul 2024 13:18:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721938700; x=1722543500; darn=freebsd.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=F2yMAo42eKCdj+x5rUSQZ/59ssvkBYL/zC6TyIZDFxs=; b=YAwoN4kaNsd9owPkw8hqFr9siOCA/xMqvHFqotfWpxVhiGpgGgM6yUCLmqMR+Z/lBN itHRPQtLTenBOJDSNKs0q1v5T5ySYw/uGxSGB8EK3LFRM9Oq6EOfNsCE+ut/lJOKs/7B G6W0yQP0p55CEOtfOnaj0FHXdVn77E1CyarsPP58WnkBZ+3x8vbgWz0yGA+vkFMSinHh FuP4YjIs+L85Zn03yTsUeqwQxce3eZAKYHJI/NXYabyPQxYDEvwWvKc3OAevEAX1nnko EHkf1ljVMERTwbnZzF/Ya0zavEUtL7ddBHUxdS3rRQF7uNeXaycoPJpWegoO2bLv5p/7 o9XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721938700; x=1722543500; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F2yMAo42eKCdj+x5rUSQZ/59ssvkBYL/zC6TyIZDFxs=; b=Pb2/cbAMxjq7OIYJrsjzWU+l29+n74OJSUJqft/Egz+Bml4MicuhLQvvL1ABM3uAuA MYosUOof/J16YbU9QXOCHecjE/Ica0ZFqDWixKsrKJMLVRpu2Yk02/c1wR3aK3AdUaCR nxCAIIsIDxoedISWVR9ddW+Y0+qYMof0CVmSdPuxxHDtc7jOQuVpEoJtsrvPxOzHhm/v YwdaRRoAZjHkZfd3JKeRvlbZiOQPbi57IQIgo9hi0s8Sqp9oHeVv3FDEdDHEIA8QVMlp SlJlLSjhw+N7NfTA3hmfrGF0laiH2+KnXW30BDL4CB6J9nCjMB0ucZRsK6CffxGxo7ac Yz+Q== X-Forwarded-Encrypted: i=1; AJvYcCUYcUN79BFz/9uE86WwhOBAIGWlFQzX5F+nKz/yWeVeStBBActoKS+L8pjuSPWN7NqR5FAlD0xEOWnZTQcpulm/wLr3pZccZuddw/c= X-Gm-Message-State: AOJu0YyE9tj3Gg5KljKzPlYmHisfYez6TMVlpnbXWtZBFzE0qq53Cud8 zJPfTjvFnzTyp31vStbm9cotXQDuX+lCYN8DIzlqX++JplboeSA7Efbd/QHJ X-Google-Smtp-Source: AGHT+IFocUYph05vYNrqpwTPxe7RiNxm8l1cZxS0EYZRYNJyLCPUOeeEo3H+cBYO0V6vru4szhQGCw== X-Received: by 2002:ac8:59c2:0:b0:446:4c0f:ef03 with SMTP id d75a77b69052e-44fe3283c46mr53121111cf.10.1721938699832; Thu, 25 Jul 2024 13:18:19 -0700 (PDT) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44fe840c096sm9056561cf.79.2024.07.25.13.18.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jul 2024 13:18:19 -0700 (PDT) Date: Thu, 25 Jul 2024 16:18:16 -0400 From: Mark Johnston To: Jake Freeland Cc: Konstantin Belousov , freebsd-hackers@freebsd.org Subject: Re: FreeBSD hugepages Message-ID: References: <1ced4290-4a31-4218-8611-63a44c307e87@technologyfriends.net> <35da66f9-b913-45ea-90f4-16a2fa072848@technologyfriends.net> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <35da66f9-b913-45ea-90f4-16a2fa072848@technologyfriends.net> X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4WVMdn32vrz46dN On Thu, Jul 25, 2024 at 02:47:16PM -0500, Jake Freeland wrote: > On 7/25/24 14:02, Konstantin Belousov wrote: > > On Thu, Jul 25, 2024 at 01:46:17PM -0500, Jake Freeland wrote: > > > Hi there, > > > > > > I have been steadily working on bringing Data Plane Development Kit (DPDK) > > > on FreeBSD up to date with the Linux version. The most significant hurdle so > > > far has been supporting concurrent DPDK processes, each with their own > > > contiguous memory regions. > > > > > > These contiguous regions are used by DPDK as a heap for allocating DMA > > > buffers and other miscellaneous resources. Retrieving the underlying memory > > > and mapping these regions is currently different on Linux and FreeBSD: > > > > > > On Linux, hugepages are fetched from the kernel's pre-allocated hugepage > > > pool and are mapped into virtual address space on DPDK initialization. Since > > > the hugepages exist in a pool, multiple processes can reserve their own > > > hugepages and operate concurrently. > > > > > > On FreeBSD, DPDK uses an in-house contigmem kernel module that reserves a > > > large contiguous region of memory on load. During DPDK initialization, the > > > entire region is mapped into virtual address space. This leaves no memory > > > for another independent DPDK process, so only one process can operate at a > > > time. > > > > > > I could modify the DPDK contigmem module to mimic Linux's hugepages, but I > > > thought it would be better to integrate and upstream a hugepage-like > > > interface directly in the FreeBSD kernel source. I am writing this email to > > > see if anyone has any advice on the matter. I did not see any previous > > > attempts at this in Phabriactor or the commit log, but it is possible that I > > > missed it. I have read about transparent superpage promotion, but that seems > > > like a different mechanism altogether. > > > > > > At a quick glance, the implementation seems straightforward: read some > > > loader tunables, allocate persistent hugepages at boot time, and create a > > > pseudo filesystem that supports creating and mapping hugepages. I could be > > > underestimating the magnitude of this task, but that is why I'm asking for > > > thoughts and advice :) > > > > > > For reference, here is Linux's documentation on hugepages: > > > https://docs.kernel.org/admin-guide/mm/hugetlbpage.html > > Are posix shm largepages objects enough (they were developed to support > > DPDK). Look for shm_create_largepage(3). > Yes, shm_create_largepage(2) looks promising, but I would like the ability > to allocate these largepages at boot time when memory fragmentation as at a > minimum. Perhaps a couple sysctl tunables could be added onto the > vm.largepages node to specify a pagesize and allocate some number of pages > at boot? We could add an rc script which creates named largepage objects. This can be done using the posixshmcontrol utility. That might not be early enough during boot for some purposes. In that case, we could have a module which creates such objects from within the kernel. This is pretty straightforward to do; I wrote a dumb version of this for a mips-specific project a few years ago, feel free to take code or inspiration from it: https://people.freebsd.org/~markj/tlbdemo.c > It seems Linux had an interface similar to shm_create_largepage(2) back in > v2.5, but they removed it in favor of their hugetlbfs filesystem. It would > be nice to stay close to the file-backed Linux interface to maximize code > sharing in userspace. It looks like the foundation for hugepages is there, > but the interface for allocation and access needs to be extended. POSIX shm objects have most of the properties one would want, I'd expect, save the ability to access them via standard syscalls. What else is missing besides the ability to reserve memory at boot time?