Re: git: 0de03c306c2f - main - man9: Add an smr(9) manual page

From: Zhenlei Huang <zlei_at_FreeBSD.org>
Date: Mon, 20 Feb 2023 15:30:46 UTC
Hi,

Glad to have this manual page !

Best regards,
Zhenlei

> On Feb 20, 2023, at 10:17 PM, Mark Johnston <markj@FreeBSD.org> wrote:
> 
> The branch main has been updated by markj:
> 
> URL: https://cgit.FreeBSD.org/src/commit/?id=0de03c306c2f95580e56c65b74ebf8c04c1416b3
> 
> commit 0de03c306c2f95580e56c65b74ebf8c04c1416b3
> Author:     Mark Johnston <markj@FreeBSD.org>
> AuthorDate: 2023-02-20 13:58:19 +0000
> Commit:     Mark Johnston <markj@FreeBSD.org>
> CommitDate: 2023-02-20 13:58:19 +0000
> 
>    man9: Add an smr(9) manual page
> 
>    Also update the UMA manual page to mention its SMR-enabled
>    functionality, and update locking.9 to mention both epoch and SMR.
>    Details of its usage are provided in the SMR manual page.
> 
>    Reviewed by:    Olivier Certner, mhorne, kib
>    MFC after:      2 weeks
>    Sponsored by:   The FreeBSD Foundation
>    Differential Revision:  https://reviews.freebsd.org/D38108
> ---
> share/man/man9/Makefile  |   9 ++
> share/man/man9/locking.9 |  37 ++++--
> share/man/man9/smr.9     | 294 +++++++++++++++++++++++++++++++++++++++++++++++
> share/man/man9/zone.9    |  42 ++++++-
> 4 files changed, 373 insertions(+), 9 deletions(-)
> 
> diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile
> index b2c141210ab9..ab6f146fa2a9 100644
> --- a/share/man/man9/Makefile
> +++ b/share/man/man9/Makefile
> @@ -319,6 +319,7 @@ MAN=	accept_filter.9 \
> 	signal.9 \
> 	sleep.9 \
> 	sleepqueue.9 \
> +	smr.9 \
> 	socket.9 \
> 	stack.9 \
> 	store.9 \
> @@ -2051,6 +2052,14 @@ MLINKS+=sleepqueue.9 init_sleepqueues.9 \
> 	sleepqueue.9 sleepq_type.9 \
> 	sleepqueue.9 sleepq_wait.9 \
> 	sleepqueue.9 sleepq_wait_sig.9
> +MLINKS+=smr.9 smr_advance.9 \
> +	smr.9 smr_create.9 \
> +	smr.9 smr_destroy.9 \
> +	smr.9 smr_enter.9 \
> +	smr.9 smr_exit.9 \
> +	smr.9 smr_poll.9 \
> +	smr.9 smr_synchronize.9 \
> +	smr.9 smr_wait.9
> MLINKS+=socket.9 soabort.9 \
> 	socket.9 soaccept.9 \
> 	socket.9 sobind.9 \
> diff --git a/share/man/man9/locking.9 b/share/man/man9/locking.9
> index 145ae6e9f1f5..e1e4ccd33664 100644
> --- a/share/man/man9/locking.9
> +++ b/share/man/man9/locking.9
> @@ -24,7 +24,7 @@
> .\"
> .\" $FreeBSD$
> .\"
> -.Dd March 29, 2022
> +.Dd February 3, 2023
> .Dt LOCKING 9
> .Os
> .Sh NAME
> @@ -145,6 +145,32 @@ for this reason, they should be avoided.
> See
> .Xr lock 9
> for details.
> +.Ss Non-blocking synchronization
> +The kernel has two facilities,
> +.Xr epoch 9
> +and
> +.Xr smr 9 ,
> +which can be used to provide read-only access to a data structure while one or
> +more writers are concurrently modifying the data structure.
> +Specifically, readers using
> +.Xr epoch 9
> +and
> +.Xr smr 9
> +to synchronize accesses do not block writers, in contrast with reader/writer
> +locks, and they help ensure that memory freed by writers is not reused until
> +all readers which may be accessing it have finished.
> +Thus, they are a useful building block in the construction of lock-free
> +data structures.
> +.Pp
> +These facilities are difficult to use correctly and should be avoided
> +in preference to traditional mutual exclusion-based synchronization,
> +except when performance or non-blocking guarantees are a major concern.
> +.Pp
> +See
> +.Xr epoch 9
> +and
> +.Xr smr 9
> +for details.
> .Ss Counting semaphores
> Counting semaphores provide a mechanism for synchronizing access
> to a pool of resources.
> @@ -394,8 +420,10 @@ At this time this is a rather easy to remember table.
> .Sh SEE ALSO
> .Xr lockstat 1 ,
> .Xr witness 4 ,
> +.Xr atomic 9 ,
> .Xr BUS_SETUP_INTR 9 ,
> .Xr condvar 9 ,
> +.Xr epoch 9 ,
> .Xr lock 9 ,
> .Xr LOCK_PROFILING 9 ,
> .Xr mtx_pool 9 ,
> @@ -404,13 +432,8 @@ At this time this is a rather easy to remember table.
> .Xr rwlock 9 ,
> .Xr sema 9 ,
> .Xr sleep 9 ,
> +.Xr smr 9 ,
> .Xr sx 9 ,
> .Xr timeout 9
> -.Sh HISTORY
> -These
> -functions appeared in
> -.Bsx 4.1
> -through
> -.Fx 7.0 .
> .Sh BUGS
> There are too many locking primitives to choose from.
> diff --git a/share/man/man9/smr.9 b/share/man/man9/smr.9
> new file mode 100644
> index 000000000000..7b2ed65b4368
> --- /dev/null
> +++ b/share/man/man9/smr.9
> @@ -0,0 +1,294 @@
> +.\" SPDX-License-Identifier: BSD-2-Clause
> +.\"
> +.\" Copyright (c) 2023 The FreeBSD Foundation
> +.\"
> +.\" This documentation was written by Mark Johnston <markj@FreeBSD.org>
> +.\" under sponsorship from the FreeBSD Foundation.
> +.\"
> +.\" Redistribution and use in source and binary forms, with or without
> +.\" modification, are permitted provided that the following conditions
> +.\" are met:
> +.\" 1. Redistributions of source code must retain the above copyright
> +.\"    notice, this list of conditions and the following disclaimer.
> +.\" 2. Redistributions in binary form must reproduce the above copyright
> +.\"    notice, this list of conditions and the following disclaimer in the
> +.\"    documentation and/or other materials provided with the distribution.
> +.\"
> +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
> +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> +.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
> +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> +.\" SUCH DAMAGE.
> +.\"
> +.Dd January 17, 2023
> +.Dt SMR 9
> +.Os
> +.Sh NAME
> +.Nm smr
> +.Nd safe memory reclamation for lock-free data structures
> +.Sh SYNOPSIS
> +.In sys/smr.h
> +.Ft smr_seq_t
> +.Fo smr_advance
> +.Fa "smr_t smr"
> +.Fc
> +.Ft smr_t
> +.Fo smr_create
> +.Fa "const char *name"
> +.Fc
> +.Ft void
> +.Fo smr_destroy
> +.Fa "smr_t smr"
> +.Fc
> +.Ft void
> +.Fo smr_enter
> +.Fa "smr_t smr"
> +.Fc
> +.Ft void
> +.Fo smr_exit
> +.Fa "smr_t smr"
> +.Fc
> +.Ft bool
> +.Fo smr_poll
> +.Fa "smr_t smr"
> +.Fa "smr_seq_t goal"
> +.Fa "bool wait"
> +.Fc
> +.Ft void
> +.Fo smr_synchronize
> +.Fa "smr_t smr"
> +.Fc
> +.Ft bool
> +.Fo smr_wait
> +.Fa "smr_t smr"
> +.Fa "smr_seq_t goal"
> +.Fc
> +.Sh DESCRIPTION
> +Safe Memory Reclamation (SMR) is a facility which enables the implementation of
> +memory-safe lock-free data structures.
> +In typical usage, read accesses to an SMR-protected data structure, such as a
> +hash table or tree, are performed in a
> +.Dq read section
> +consisting of code bracketed by
> +.Fn smr_enter
> +and
> +.Fn smr_exit
> +calls, while mutations of the data structure are serialized by a traditional
> +mutex such as
> +.Xr mutex 9 .
> +In contrast with reader-writer locks such as
> +.Xr rwlock 9 ,
> +.Xr rmlock 9 ,
> +and
> +.Xr sx 9 ,
> +SMR allows readers and writers to access the data structure concurrently.
> +Readers can always enter a read section immediately
> +.Po
> +.Fn smr_enter
> +never blocks
> +.Pc ,
> +so mutations do not introduce read latency.
> +Furthermore,
> +.Fn smr_enter
> +and
> +.Fn smr_exit
> +operate only on per-CPU data and thus avoid some of the performance problems
> +inherent in the implementation of traditional reader-writer mutexes.
> +SMR can therefore be a useful building block for data structures which are
> +accessed frequently but are only rarely modified.
> +.Pp
> +Note that any SMR-protected data structure must be implemented carefully such
> +that operations behave correctly in the absence of mutual exclusion between
> +readers and writers.
> +The data structure must be designed to be lock-free; SMR merely facilitates
> +the implementation, for example by making it safe to follow dangling pointers
> +and by helping avoid the ABA problem.
> +.Pp
> +When shared accesses to and mutations of a data structure can proceed
> +concurrently, writers must take care to ensure that any items removed from the
> +structure are not freed and recycled while readers are accessing them in
> +parallel.
> +This requirement results in a two-phase approach to the removal of items:
> +first, the item is unlinked such that all pointers to the item are removed from
> +the structure, preventing any new readers from observing the item.
> +Then, the writer waits until some mechanism guarantees that no existing readers
> +are still accessing the item.
> +At that point the memory for that item can be freed and reused safely.
> +SMR provides this mechanism: readers may access a lock-free data structure in
> +between calls to the
> +.Fn smr_enter
> +and
> +.Fn smr_exit
> +functions, which together create a read section, and the
> +.Fn smr_advance ,
> +.Fn smr_poll ,
> +.Fn smr_wait ,
> +and
> +.Fn smr_synchronize
> +functions can be used to wait for threads in read sections to finish.
> +All of these functions operate on a
> +.Ft smr_t
> +state block which holds both per-CPU and global state.
> +Readers load global state and modify per-CPU state, while writers must scan all
> +per-CPU states to detect active readers.
> +SMR is designed to amortize this cost by batching to give acceptable
> +performance in write-heavy workloads.
> +.Ss Readers
> +Threads enter a read section by calling
> +.Fn smr_enter .
> +Read sections should be short, and many operations are not permitted while in
> +a read section.
> +Specifically, context switching is disabled, and thus readers may not acquire
> +blocking mutexes such as
> +.Xr mutex 9 .
> +Another consequence of this is that the thread is pinned to the current CPU for
> +the duration of the read section.
> +Furthermore, read sections may not be nested: it is incorrect to call
> +.Fn smr_enter
> +with a given
> +.Ft smr_t
> +state block when already in a read section for that state block.
> +.Ss UMA Integration
> +To simplify the integration of SMR into consumers, the
> +.Xr uma 9
> +kernel memory allocator provides some SMR-specified facilities.
> +This eliminates a good deal of complexity from the implementation of consumers
> +and automatically batches write operations.
> +.Pp
> +In typical usage, a UMA zone (created with the
> +.Dv UMA_ZONE_SMR
> +flag or initialized with the
> +.Fn uma_zone_set_smr
> +function) is coupled with a
> +.Ft smr_t
> +state block.
> +To insert an item into an SMR-protected data structure, memory is allocated
> +from the zone using the
> +.Fn uma_zalloc_smr
> +function.
> +Insertions and removals are serialized using traditional mutual exclusion
> +and items are freed using the
> +.Fn uma_zfree_smr
> +function.
> +Read-only lookup operations are performed in SMR read sections.
> +.Fn uma_zfree_smr
> +waits for all active readers which may be accessing the freed item to finish
> +their read sections before recycling that item's memory.
> +.Pp
> +If the zone has an associated per-item destructor, it will be invoked at some
> +point when no readers can be accessing a given item.
> +The destructor can be used to release additional resources associated with the
> +item.
> +Note however that there is no guarantee that the destructor will be invoked in
> +a bounded time period.
> +.Ss Writers
> +Consumers are expected to use SMR in conjunction with UMA and thus need only
> +make use of the
> +.Fn smr_enter
> +and
> +.Fn smr_exit
> +functions, and the SMR helper macros defined in
> +.Pa sys/smr_types.h .
> +However, an introduction to the write-side interface of SMR can be useful.
> +.Pp
> +Internally, SMR maintains a global
> +.Ql write sequence
> +number.
> +When entering a read section,
> +.Fn smr_enter
> +loads a copy of the write sequence and stores it in per-CPU memory, hence
> +.Ql observing
> +that value.
> +To exit a read section, this per-CPU memory is overwritten with an invalid
> +value, making the CPU inactive.
> +Writers perform two operations: advancing the write sequence number, and
> +polling all CPUs to see whether active readers have observed a given sequence
> +number.
> +These operations are performed by
> +.Fn smr_advance
> +and
> +.Fn smr_poll ,
> +respectively, which do not require serialization between multiple writers.
> +.Pp
> +After a writer unlinks an item from a data structure, it increments the write
> +sequence number and tags the item with the new value returned by
> +.Fn smr_advance .
> +Once all CPUs have observed the new value, the writer can use
> +.Fn smr_poll
> +to deduce that no active readers have access to the unlinked item, and thus the
> +item is safe to recycle.
> +Because this pair of operations is relatively expensive, it is generally a good
> +idea to amortize this cost by accumulating a collection of multiple unlinked
> +items and tagging the entire batch with a target write sequence number.
> +.Pp
> +.Fn smr_poll
> +is a non-blocking operation and returns true only if all active readers are
> +guaranteed to have observed the target sequence number value.
> +.Fn smr_wait
> +is a variant of
> +.Fn smr_poll
> +which waits until all CPUs have observed the target sequence number value.
> +.Fn smr_synchronize
> +combines
> +.Fn smr_advance
> +with
> +.Fn smr_wait
> +to wait for all CPUs to observe a new write sequence number.
> +This is an expensive operation and should only be used if polling cannot be
> +deferred in some way.
> +.Ss Memory Ordering
> +The
> +.Fn smr_enter
> +function has acquire semantics, and the
> +.Fn smr_exit
> +function has release semantics.
> +The
> +.Fn smr_advance ,
> +.Fn smr_poll ,
> +.Fn smr_wait ,
> +and
> +.Fn smr_synchronize
> +functions should not be assumed to have any guarantees with respect to memory
> +ordering.
> +In practice, some of these functions have stronger ordering semantics than
> +is stated here, but this is specific to the implementation and should not be
> +relied upon.
> +See
> +.Xr atomic 9
> +for more details.
> +.Sh NOTES
> +Outside of
> +.Fx
> +the acronym SMR typically refers to a family of algorithms which enable
> +memory-safe read-only access to a data structure concurrent with modifications
> +to that data structure.
> +Here, SMR refers to a particular algorithm belonging to this family, as well as
> +its implementation in
> +.Fx .
> +See
> +.Pa sys/sys/smr.h
> +and
> +.Pa sys/kern/subr_smr.c
> +in the
> +.Fx
> +source tree for further details on the algorithm and the context.
> +.Pp
> +The acronym SMR is also used to mean "shingled magnetic recording", a
> +technology used to store data on hard disk drives which requires operating
> +system support.
> +These two uses of the acronym are unrelated.
> +.Sh SEE ALSO
> +.Xr atomic 9 ,
> +.Xr locking 9 ,
> +.Xr uma 9
> +.Sh AUTHORS
> +The SMR algorithm and its implementation were provided by
> +.An Jeff Roberson Aq Mt jeff@FreeBSD.org .
> +This manual page was written by
> +.An Mark Johnston Aq Mt markj@FreeBSD.org .
> diff --git a/share/man/man9/zone.9 b/share/man/man9/zone.9
> index 5c1b4d31d8aa..5f7f99b8f44f 100644
> --- a/share/man/man9/zone.9
> +++ b/share/man/man9/zone.9
> @@ -25,7 +25,7 @@
> .\"
> .\" $FreeBSD$
> .\"
> -.Dd February 15, 2022
> +.Dd January 16, 2023
> .Dt UMA 9
> .Os
> .Sh NAME
> @@ -79,6 +79,8 @@ typedef void (*uma_free)(void *item, vm_size_t size, uint8_t pflag);
> .Fn uma_zalloc_pcpu "uma_zone_t zone" "int flags"
> .Ft "void *"
> .Fn uma_zalloc_pcpu_arg "uma_zone_t zone" "void *arg" "int flags"
> +.Ft "void *"
> +.Fn uma_zalloc_smr "uma_zone_t zone" "int flags"
> .Ft void
> .Fn uma_zfree "uma_zone_t zone" "void *item"
> .Ft void
> @@ -88,6 +90,8 @@ typedef void (*uma_free)(void *item, vm_size_t size, uint8_t pflag);
> .Ft void
> .Fn uma_zfree_pcpu_arg "uma_zone_t zone" "void *item" "void *arg"
> .Ft void
> +.Fn uma_zfree_smr "uma_zone_t zone" "void *item"
> +.Ft void
> .Fn uma_prealloc "uma_zone_t zone" "int nitems"
> .Ft void
> .Fn uma_zone_reserve "uma_zone_t zone" "int nitems"
> @@ -117,6 +121,10 @@ typedef void (*uma_free)(void *item, vm_size_t size, uint8_t pflag);
> .Fn uma_zone_set_warning "uma_zone_t zone" "const char *warning"
> .Ft void
> .Fn uma_zone_set_maxaction "uma_zone_t zone" "void (*maxaction)(uma_zone_t)"
> +.Ft smr_t
> +.Fn uma_zone_get_smr "uma_zone_t zone"
> +.Ft void
> +.Fn uma_zone_set_smr "uma_zone_t zone" "smr_t smr"
> .In sys/sysctl.h
> .Fn SYSCTL_UMA_MAX parent nbr name access zone descr
> .Fn SYSCTL_ADD_UMA_MAX ctx parent nbr name access zone descr
> @@ -330,6 +338,14 @@ Cached items that have not been used for a long period may also be freed from
> zone.
> When this flag is set, the system will not reclaim memory from the zone's
> caches.
> +.It Dv UMA_ZONE_SMR
> +Create a zone whose items will be synchronized using the
> +.Xr smr 9
> +mechanism.
> +Upon creation the zone will have an associated
> +.Dt smr_t
> +structure which can be fetched using
> +.Fn uma_zone_get_smr .
> .El
> .Pp
> Zones can be destroyed using
> @@ -390,6 +406,17 @@ then
> does nothing.
> .Pp
> The
> +.Fn uma_zalloc_smr
> +and
> +.Fn uma_zfree_smr
> +functions allocate and free items from an SMR-enabled zone, that is,
> +a zone created with
> +.Dv UMA_ZONE_SMR
> +or a zone that has had
> +.Fn uma_zone_set_smr
> +called.
> +.Pp
> +The
> .Fn uma_zalloc_domain
> function allows callers to specify a fixed
> .Xr numa 4
> @@ -535,6 +562,16 @@ Therefore,
> this function should do very little work (similar to a signal handler).
> .Pp
> The
> +.Fn uma_zone_set_smr
> +function associates an existing
> +.Xr smr 9
> +structure with a UMA zone.
> +The effect is similar to creating a zone with the
> +.Dv UMA_ZONE_SMR
> +flag, except that a new SMR structure is not created.
> +This function must be called before any allocations from the zone are performed.
> +.Pp
> +The
> .Fn SYSCTL_UMA_MAX parent nbr name access zone descr
> macro declares a static
> .Xr sysctl 9
> @@ -577,7 +614,8 @@ non-executable memory.
> .Sh SEE ALSO
> .Xr numa 4 ,
> .Xr vmstat 8 ,
> -.Xr malloc 9
> +.Xr malloc 9 ,
> +.Xr smr 9
> .Rs
> .%A Jeff Bonwick
> .%T "The Slab Allocator: An Object-Caching Kernel Memory Allocator"