From nobody Fri Feb 03 16:50:35 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P7hVM46FYz3kW1w; Fri, 3 Feb 2023 16:50:35 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4P7hVM1g6kz3tr9; Fri, 3 Feb 2023 16:50:35 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1675443035; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DylXrVgoMpX8BhwodueZueeRjYjFeB5OC7PszcVEK5I=; b=w2VEm8ucNhSaCO86XmYxm8Q7sCEGFwr8mnCmIwwoiy3GjuqZjN5bphPAC4ePFn3kbzbv8C gcJaNSZRZD6FGKL1cEFqZyKVpIyitDyd9zmW5k0y0/Upvd54qXX2IKUuaprB4YkLtCWXtI vVZcXUcn9Ueqal9c+YZA01DeNxKlNHDGpX1iyF/rGatgKpvfz89J0j5JbgdoGWQMAbl/gM 4ay0rmZbwJx9cgJMuuV78TNDf/Ui9TYagMmnAUvG/dji+Yp948ukGpPNVf6HLZQdsjYbmo VXwHZVi9X2E1r3oIa39/al7mQtaedxwi6MaXrwUdOYyTk0vWMPv52jKrq+eYlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1675443035; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DylXrVgoMpX8BhwodueZueeRjYjFeB5OC7PszcVEK5I=; b=rajQT/T51mpedmxrZBdMeLNRTP+fcaHKKPahNIxohairD4vzc52EKQoUvWNTN+/m/LHeRW vK5VrMrV2taexgE5SDTw4mmYg12ytKVbS4eyZiXQFoKTv/1piLadA47KLUjnaEqPFCuYWd ZCKK/UR/FJogfnrG4Iuog70NlD34AkutKomFPyz4mlCpCMYNR/ve6V2hTBU5kO4aYTUIC5 dozzhyibdxXT6uv38pzRjz68nm7DQyJr71EC5gILgJ/AGH6Nwfw7UiCxdnLw3pBHz9VxX7 FWFXrB+UJo/zplKZRAdxNoC1H2oNWirpD63qixYyMM3kYvsr2I+U4bBp+o6SXA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1675443035; a=rsa-sha256; cv=none; b=qVGrji43fuefYgd/DwKdd5RVbzejShRdz3rTotbVTjp+WpNfaGBeDuTqItWtMIgld1Mq2m Mwq1MXZwcrx2NB4tyNw5wxyDCvKW8Nd7s7l7cfoItvzOJMadQ9z1iCSBfC+jBNhlcmbYxW newhUl+B+tyRPDH1nlBJewMA4N1onhJDckqBmHbRbIfmi7pr3nYfWmltKAZczItIPdtHSD XScjbE5QdIgnnWyvzAYECFp3+JQRQqeAywj0/5jM6ByrS4F83FRnLc/caCc7WgCy7TQXeN IHrIG6y4bZzOKMfMndBQdJUoAfDdvaSmO9lWgAbAAU7AxlM68k+3hqsq6ooAtQ== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4P7hVM0mHVzhlq; Fri, 3 Feb 2023 16:50:35 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 313GoZNF009135; Fri, 3 Feb 2023 16:50:35 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 313GoZ8k009134; Fri, 3 Feb 2023 16:50:35 GMT (envelope-from git) Date: Fri, 3 Feb 2023 16:50:35 GMT Message-Id: <202302031650.313GoZ8k009134@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Mark Johnston Subject: git: 5f03f96fbefb - main - shm: Document shm_create_largepage() List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 5f03f96fbefbb5c68a5d7d06728ff5b4a05f87b0 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=5f03f96fbefbb5c68a5d7d06728ff5b4a05f87b0 commit 5f03f96fbefbb5c68a5d7d06728ff5b4a05f87b0 Author: Mark Johnston AuthorDate: 2023-02-03 15:55:30 +0000 Commit: Mark Johnston CommitDate: 2023-02-03 16:48:25 +0000 shm: Document shm_create_largepage() While here, move notes about FreeBSD-specific functionality to the COMPATIBILITY section, and document the ECAPMODE error for shm_open(). Reviewed by: pauamma, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38282 --- lib/libc/sys/Makefile.inc | 1 + lib/libc/sys/shm_open.2 | 171 +++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 162 insertions(+), 10 deletions(-) diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc index 5e2c3da198b0..a86d7d160b6c 100644 --- a/lib/libc/sys/Makefile.inc +++ b/lib/libc/sys/Makefile.inc @@ -482,6 +482,7 @@ MLINKS+=setuid.2 setegid.2 \ setuid.2 setgid.2 MLINKS+=shmat.2 shmdt.2 MLINKS+=shm_open.2 memfd_create.3 \ + shm_open.2 shm_create_largepage.3 \ shm_open.2 shm_unlink.2 \ shm_open.2 shm_rename.2 MLINKS+=sigwaitinfo.2 sigtimedwait.2 diff --git a/lib/libc/sys/shm_open.2 b/lib/libc/sys/shm_open.2 index 4c03288b6bbe..a728bd0d4abf 100644 --- a/lib/libc/sys/shm_open.2 +++ b/lib/libc/sys/shm_open.2 @@ -28,11 +28,11 @@ .\" .\" $FreeBSD$ .\" -.Dd June 25, 2021 +.Dd January 30, 2023 .Dt SHM_OPEN 2 .Os .Sh NAME -.Nm memfd_create , shm_open , shm_rename, shm_unlink +.Nm memfd_create , shm_create_largepage , shm_open , shm_rename, shm_unlink .Nd "shared memory object operations" .Sh LIBRARY .Lb libc @@ -43,6 +43,14 @@ .Ft int .Fn memfd_create "const char *name" "unsigned int flags" .Ft int +.Fo shm_create_largepage +.Fa "const char *path" +.Fa "int flags" +.Fa "int psind" +.Fa "int alloc_policy" +.Fa "mode_t mode" +.Fc +.Ft int .Fn shm_open "const char *path" "int flags" "mode_t mode" .Ft int .Fn shm_rename "const char *path_from" "const char *path_to" "int flags" @@ -51,7 +59,7 @@ .Sh DESCRIPTION The .Fn shm_open -system call opens (or optionally creates) a +function opens (or optionally creates) a POSIX shared memory object named .Fa path . @@ -114,9 +122,7 @@ see and .Xr fcntl 2 . .Pp -As a -.Fx -extension, the constant +The constant .Dv SHM_ANON may be used for the .Fa path @@ -143,6 +149,131 @@ will fail with All other flags are ignored. .Pp The +.Fn shm_create_largepage +function behaves similarly to +.Fn shm_open , +except that the +.Dv O_CREAT +flag is implicitly specified, and the returned +.Dq largepage +object is always backed by aligned, physically contiguous chunks of memory. +This ensures that the object can be mapped using so-called +.Dq superpages , +which can improve application performance in some workloads by reducing the +number of translation lookaside buffer (TLB) entries required to access a +mapping of the object, +and by reducing the number of page faults performed when accessing a mapping. +This happens automatically for all largepage objects. +.Pp +An existing largepage object can be opened using the +.Fn shm_open +function. +Largepage shared memory objects behave slightly differently from non-largepage +objects: +.Bl -bullet -offset indent +.It +Memory for a largepage object is allocated when the object is +extended using the +.Xr ftruncate 2 +system call, whereas memory for regular shared memory objects is allocated +lazily and may be paged out to a swap device when not in use. +.It +The size of a mapping of a largepage object must be a multiple of the +underlying large page size. +Most attributes of such a mapping can only be modified at the granularity +of the large page size. +For example, when using +.Xr munmap 2 +to unmap a portion of a largepage object mapping, or when using +.Xr mprotect 2 +to adjust protections of a mapping of a largepage object, the starting address +must be large page size-aligned, and the length of the operation must be a +multiple of the large page size. +If not, the corresponding system call will fail and set +.Va errno +to +.Er EINVAL . +.El +.Pp +The +.Fa psind +argument to +.Fn shm_create_largepage +specifies the size of large pages used to back the object. +This argument is an index into the page sizes array returned by +.Xr getpagesizes 3 . +In particular, all large pages backing a largepage object must be of the +same size. +For example, on a system with large page sizes of 2MB and 1GB, a 2GB largepage +object will consist of either 1024 2MB pages, or 2 1GB pages, depending on +the value specified for the +.Fa psind +argument. +The +.Fa alloc_policy +parameter specifies what happens when an attempt to use +.Xr ftruncate 2 +to allocate memory for the object fails. +The following values are accepted: +.Bl -tag -offset indent -width SHM_ +.It Dv SHM_LARGEPAGE_ALLOC_DEFAULT +If the (non-blocking) memory allocation fails because there is insufficient free +contiguous memory, the kernel will attempt to defragment physical memory and +try another allocation. +The subsequent allocation may or may not succeed. +If this subsequent allocation also fails, +.Xr ftruncate 2 +will fail and set +.Va errno +to +.Er ENOMEM . +.It Dv SHM_LARGEPAGE_ALLOC_NOWAIT +If the memory allocation fails, +.Xr ftruncate 2 +will fail and set +.Va errno +to +.Er ENOMEM . +.It Dv SHM_LARGEPAGE_ALLOC_HARD +The kernel will attempt defragmentation until the allocation succeeds, +or an unblocked signal is delivered to the thread. +However, it is possible for physical memory to be fragmented such that the +allocation will never succeed. +.El +.Pp +The +.Dv FIOSSHMLPGCNF +and +.Dv FIOGSHMLPGCNF +.Xr ioctl 2 +commands can be used with a largepage shared memory object to get and set +largepage object parameters. +Both commands operate on the following structure: +.Bd -literal +struct shm_largepage_conf { + int psind; + int alloc_policy; +}; + +.Ed +The +.Dv FIOGSHMLPGCNF +command populates this structure with the current values of these parameters, +while the +.Dv FIOSSHMLPGCNF +command modifies the largepage object. +Currently only the +.Va alloc_policy +parameter may be modified. +Internally, +.Fn shm_create_largepage +works by creating a regular shared memory object using +.Fn shm_open , +and then converting it into a largepage object using the +.Dv FIOSSHMLPGCNF +ioctl command. +.Pp +The .Fn shm_rename system call atomically removes a shared memory object named .Fa path_from @@ -162,10 +293,6 @@ Return an error if an shm exists at .Fa path_to , rather than unlinking it. .El -.Fn shm_rename -is also a -.Fx -extension. .Pp The .Fn shm_unlink @@ -235,6 +362,17 @@ All functions return -1 on failure, and set to indicate the error. .Sh COMPATIBILITY The +.Fn shm_create_largepage +and +.Fn shm_rename +functions are +.Fx +extensions, as is support for the +.Dv SHM_ANON +value in +.Fn shm_open . +.Pp +The .Fa path , .Fa path_from , and @@ -377,6 +515,18 @@ and are specified and the named shared memory object does exist. .It Bq Er EACCES The required permissions (for reading or reading and writing) are denied. +.It Bq Er ECAPMODE +The process is running in capability mode (see +.Xr capsicum 4 ) +and attempted to create a named shared memory object. +.El +.Pp +.Fn shm_create_largepage +can fail for the reasons listed above. +It also fails with these error codes for the following conditions: +.Bl -tag -width Er +.It Bq Er ENOTTY +The kernel does not support large pages on the current platform. .El .Pp The following errors are defined for @@ -425,6 +575,7 @@ requires write permission to the shared memory object. .Xr close 2 , .Xr fstat 2 , .Xr ftruncate 2 , +.Xr ioctl 2 , .Xr mmap 2 , .Xr munmap 2 , .Xr sendfile 2