From nobody Mon Jan 23 22:12:00 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P148J4WBDz2t0SW; Mon, 23 Jan 2023 22:12:00 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4P148J48gYz3LG8; Mon, 23 Jan 2023 22:12:00 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1674511920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Sbdz0CQLzJJefqmsrY5iNky5tQFnGGX1wR2rdn7YChk=; b=EvDlegwR7HNEKKq9gi/n1iGjCmixtKuBXxqqjvBTi/8vGFUiQtTjeRyt9UueePKMQvGsJX ShNo74CtUN7+R/mNYbLHR4BsB1pisYf5HPfqRmXwfIE14XMpyuyUvIKc3heu26eail2fxO u8s3blen8iYOAr6NYyXb6gMa+fa4xhQLydgG6slDaOAgGwZ3Ag3u/7pWPSMKP2p1DxCaz8 n+vp8GtrsfFrj1AnO3hr9fdpiQoOHvq4CnAO+BryVhGDal3iqoSt3z9Lu6Hhsp9QZZu5Va Bq6u2lK5IvN6pJUipuaPUCTS1yn+LSJHpgaujsbeaYJDAEQqe48lEEZkchsptg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1674511920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Sbdz0CQLzJJefqmsrY5iNky5tQFnGGX1wR2rdn7YChk=; b=YMZKLbGZNHogKslP2VhJgXSlQUAdWFpm0rJdDWHr6QWJhVzcBEJTyZAW2pKsPVBjfvOYOY POo6MiJpdsofR+9p06V2RzNuCOsfT2+BHUlPGFo6GCcPPR/ntIhQy0MaK9l2m8aJkfe5Ef QfDSjT0RmscrjdKv/Frka9laYIwgrqh0ZAV/ogaUZ1/6Ib3Vbt/XPGdZzL2L/rJYqj3WBR CCDHND4mrJwXeJKmOH2EWMXTHeIwcFAIihEOc/waIlRKzlFOwt8M7IPC76lvW6TSQkwJvr mtZBBiWInO8hSQ8sxcPq8mFio742EAXHKO/NLjdqMcQmxXY+h/JxDlUXIWE/ug== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1674511920; a=rsa-sha256; cv=none; b=TBvGCkLPJm3OH0YN4RGOyNxfHe/puBdO/Aub3cb55Y/iKesqDnu9z7cqSlzOSJsnKYs+Nk YUMoluGxoI2qUjaANj+HrJHoJKo4eaRehUxtqvlWv/xvqSJPtBEY6oDsGd2pPwgDhj4yXQ KnycKqsVSfGPNdAZa0Wo6wlo9nFMsgJ5lUoSfSJKMuzzdfwNpIr8sVNpPk0EtsEGLfMKDu QFiyVgDCerDxFs1VnEZ1BDcXwwEg+BaVwP0qw2TybEpdMqpaoRW4tn6pcSOCHiXNMmndSk wXxiv8qUg6Pb2Esc7F8M7S8CEf9LI2bnFh8nKDdGZFGRgxQO5FgPK86NvKDGPA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4P148J32nYzlZ0; Mon, 23 Jan 2023 22:12:00 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 30NMC0Oh012853; Mon, 23 Jan 2023 22:12:00 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 30NMC0sK012852; Mon, 23 Jan 2023 22:12:00 GMT (envelope-from git) Date: Mon, 23 Jan 2023 22:12:00 GMT Message-Id: <202301232212.30NMC0sK012852@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: "Alexander V. Chernikov" Subject: git: 02b958b19535 - stable/13 - netlink: add netlink user documentation. List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: melifaro X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 02b958b19535828d8f19bf3601ae88ecf4503d33 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by melifaro: URL: https://cgit.FreeBSD.org/src/commit/?id=02b958b19535828d8f19bf3601ae88ecf4503d33 commit 02b958b19535828d8f19bf3601ae88ecf4503d33 Author: Alexander V. Chernikov AuthorDate: 2022-11-01 12:20:13 +0000 Commit: Alexander V. Chernikov CommitDate: 2023-01-23 22:04:03 +0000 netlink: add netlink user documentation. Add netlink(4) as a "frontend" manpage describing netlink in general. Add rtnelink(4) describing supported commands and attributes in NETLINK_ROUTE family. Add genetlink(4) describing generic netlink API. Reviewed by: pauamma Differential Revision: https://reviews.freebsd.org/D37011 (cherry picked from commit 7366c0a49c9a60d3eea7520d7ae4bc2b3ab172f3) --- share/man/man4/genetlink.4 | 147 +++++++++++++ share/man/man4/netlink.4 | 344 ++++++++++++++++++++++++++++++ share/man/man4/rtnetlink.4 | 519 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 1010 insertions(+) diff --git a/share/man/man4/genetlink.4 b/share/man/man4/genetlink.4 new file mode 100644 index 000000000000..2c5b9b99f994 --- /dev/null +++ b/share/man/man4/genetlink.4 @@ -0,0 +1,147 @@ +.\" +.\" Copyright (C) 2022 Alexander Chernikov . +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd November 1, 2022 +.Dt GENETLINK 4 +.Os +.Sh NAME +.Nm genetlink +.Nd Generic Netlink +.Sh SYNOPSIS +.In netlink/netlink.h +.In netlink/netlink_generic.h +.Ft int +.Fn socket AF_NETLINK SOCK_DGRAM NETLINK_GENERIC +.Sh DESCRIPTION +The +.Dv NETLINK_GENERIC +is a "container" family, used for dynamic registration of other families +belonging to the various subsystems. +These subsystems provide a string family name during registration and +receive a dynamically-allocated family id. +Allocated family identifiers are then used by applications to get access to +functions provided by that subsystem via netlink. +There are standard methods for resolving string family names to family +identifiers. +A similar mechanism works for the notification groups provided by those +families. +.Pp +All generic netlink families share a common header: +.Bd -literal +struct genlmsghdr { + uint8_t cmd; /* command within the family */ + uint8_t version; /* ABI version for the cmd */ + uint16_t reserved; /* reserved: set to 0 */ +}; +.Ed +The family id is encoded in the +.Dv nlmsg_type +of the base netlink header. +The +.Va cmd +field is the command identifier within the family. +The +.Va version +field is the command version. +.Sh METHODS +The generic Netlink framework provides the base family, +.Dv GENL_ID_CTRL +("nlctrl") with a fixed family id. +This family is used to list the details of all registered families. +.Pp +The following messages are supported by the framework: +.Ss CTRL_CMD_GETFAMILY +Fetches a single family or all registered families, depending on the +.Dv NLM_F_DUMP +flag. +Each family is reported as +.Dv CTRL_CMD_NEWFAMILY +message. +The following filters are recognised by the kernel: +.Pp +.Bd -literal -offset indent -compact +CTRL_ATTR_FAMILY_ID (uint16_t) current family id assigned by kernel +CTRL_ATTR_FAMILY_NAME (string) family name +.Ed +.Ss TLVs +.Bl -tag -width indent +.It Dv CTRL_ATTR_FAMILY_ID +(uint16_t) Dynamically-assigned family identifier. +.It Dv CTRL_ATTR_FAMILY_NAME +(string) Family name. +.It Dv CTRL_ATTR_HDRSIZE +(uint32_t) Family mandatory header size (typically 0). +.It Dv CTRL_ATTR_MAXATTR +(uint32_t) Maximum attribute number valid for the family. +.It Dv CTRL_ATTR_OPS +(nested) List of the operations supported by the family. +The attribute consists of a list of nested TLVs, with attribute values +monotonically incremented, starting from 0. +The following attributes are present in each TLV: +.Bl -tag -width indent +.It Dv CTRL_ATTR_OP_ID +Operation (message) number. +.It Dv CTRL_ATTR_OP_FLAGS +Operation flags. +The following flags are supported: +.Bd -literal -offset indent -compact +GENL_ADMIN_PERM requires elevated permissions +GENL_CMD_CAP_DO operation is a modification request +GENL_CMD_CAP_DUMP operation is a get/dump request +.Ed +.El +.It Dv CTRL_ATTR_MCAST_GROUPS +(nested) List of the notification groups supported by the family. +The attribute consists of a list of nested TLVs, with attribute values +monotonically incremented, starting from 0. +The following attributes are present in each TLV: +.Bl -tag -width indent +.It Dv CTRL_ATTR_MCAST_GRP_ID +Group id that can be used in +.Dv NETLINK_ADD_MEMBERSHIP +.Xr setsockopt 2 . +.It Dv CTRL_ATTR_MCAST_GRP_NAME +(string) Human-readable name of the group. +.El +.El +.Ss Groups +The following groups are defined: +.Bd -literal -offset indent -compact +"notify" Notifies on family registrations/removal. +.Ed +.Sh SEE ALSO +.Xr netlink 4 +.Sh HISTORY +The +.Dv NETLINK_GENERIC +protocol family appeared in +.Fx 14.0 . +.Sh AUTHORS +The netlink was implementated by +.An -nosplit +.An Alexander Chernikov Aq Mt melifaro@FreeBSD.org . +It was derived from the Google Summer of Code 2021 project by +.An Ng Peng Nam Sean . diff --git a/share/man/man4/netlink.4 b/share/man/man4/netlink.4 new file mode 100644 index 000000000000..c75366f560f0 --- /dev/null +++ b/share/man/man4/netlink.4 @@ -0,0 +1,344 @@ +.\" +.\" Copyright (C) 2022 Alexander Chernikov . +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd November 1, 2022 +.Dt NETLINK 4 +.Os +.Sh NAME +.Nm Netlink +.Nd Kernel network configuration protocol +.Sh SYNOPSIS +.In netlink/netlink.h +.In netlink/netlink_route.h +.Ft int +.Fn socket AF_NETLINK SOCK_DGRAM int family +.Sh DESCRIPTION +Netlink is a user-kernel message-based communication protocol primarily used +for network stack configuration. +Netlink is easily extendable and supports large dumps and event +notifications, all via a single socket. +The protocol is fully asynchronous, allowing one to issue and track multiple +requests at once. +Netlink consists of multiple families, which commonly group the commands +belonging to the particular kernel subsystem. +Currently, the supported families are: +.Pp +.Bd -literal -offset indent -compact +NETLINK_ROUTE network configuration, +NETLINK_GENERIC "container" family +.Ed +.Pp +The +.Dv NETLINK_ROUTE +family handles all interfaces, addresses, neighbors, routes, and VNETs +configuration. +More details can be found in +.Xr rtnetlink 4 . +The +.Dv NETLINK_GENERIC +family serves as a +.Do container Dc , +allowing registering other families under the +.Dv NETLINK_GENERIC +umbrella. +This approach allows using a single netlink socket to interact with +multiple netlink families at once. +More details can be found in +.Xr genetlink 4 . +.Pp +Netlink has its own sockaddr structure: +.Bd -literal +struct sockaddr_nl { + uint8_t nl_len; /* sizeof(sockaddr_nl) */ + sa_family_t nl_family; /* netlink family */ + uint16_t nl_pad; /* reserved, set to 0 */ + uint32_t nl_pid; /* automatically selected, set to 0 */ + uint32_t nl_groups; /* multicast groups mask to bind to */ +}; +.Ed +.Pp +Typically, filling this structure is not required for socket operations. +It is presented here for completeness. +.Sh PROTOCOL DESCRIPTION +The protocol is message-based. +Each message starts with the mandatory +.Va nlmsghdr +header, followed by the family-specific header and the list of +type-length-value pairs (TLVs). +TLVs can be nested. +All headers and TLVS are padded to 4-byte boundaries. +Each +.Xr send 2 or +.Xr recv 2 +system call may contain multiple messages. +.Ss BASE HEADER +.Bd -literal +struct nlmsghdr { + uint32_t nlmsg_len; /* Length of message including header */ + uint16_t nlmsg_type; /* Message type identifier */ + uint16_t nlmsg_flags; /* Flags (NLM_F_) */ + uint32_t nlmsg_seq; /* Sequence number */ + uint32_t nlmsg_pid; /* Sending process port ID */ +}; +.Ed +.Pp +The +.Va nlmsg_len +field stores the whole message length, in bytes, including the header. +This length has to be rounded up to the nearest 4-byte boundary when +iterating over messages. +The +.Va nlmsg_type +field represents the command/request type. +This value is family-specific. +The list of supported commands can be found in the relevant family +header file. +.Va nlmsg_seq +is a user-provided request identifier. +An application can track the operation result using the +.Dv NLMSG_ERROR +messages and matching the +.Va nlmsg_seq +. +The +.Va nlmsg_pid +field is the message sender id. +This field is optional for userland. +The kernel sender id is zero. +The +.Va nlmsg_flags +field contains the message-specific flags. +The following generic flags are defined: +.Pp +.Bd -literal -offset indent -compact +NLM_F_REQUEST Indicates that the message is an actual request to the kernel +NLM_F_ACK Request an explicit ACK message with an operation result +.Ed +.Pp +The following generic flags are defined for the "GET" request types: +.Pp +.Bd -literal -offset indent -compact +NLM_F_ROOT Return the whole dataset +NLM_F_MATCH Return all entries matching the criteria +.Ed +These two flags are typically used together, aliased to +.Dv NLM_F_DUMP +.Pp +The following generic flags are defined for the "NEW" request types: +.Pp +.Bd -literal -offset indent -compact +NLM_F_CREATE Create an object if none exists +NLM_F_EXCL Don't replace an object if it exists +NLM_F_REPLACE Replace an existing matching object +NLM_F_APPEND Append to an existing object +.Ed +.Pp +The following generic flags are defined for the replies: +.Pp +.Bd -literal -offset indent -compact +NLM_F_MULTI Indicates that the message is part of the message group +NLM_F_DUMP_INTR Indicates that the state dump was not completed +NLM_F_DUMP_FILTERED Indicates that the dump was filtered per request +NLM_F_CAPPED Indicates the original message was capped to its header +NLM_F_ACK_TLVS Indicates that extended ACK TLVs were included +.Ed +.Ss TLVs +Most messages encode their attributes as type-length-value pairs (TLVs). +The base TLV header: +.Bd -literal +struct nlattr { + uint16_t nla_len; /* Total attribute length */ + uint16_t nla_type; /* Attribute type */ +}; +.Ed +The TLV type +.Pq Va nla_type +scope is typically the message type or group within a family. +For example, the +.Dv RTN_MULTICAST +type value is only valid for +.Dv RTM_NEWROUTE +, +.Dv RTM_DELROUTE +and +.Dv RTM_GETROUTE +messages. +TLVs can be nested; in that case internal TLVs may have their own sub-types. +All TLVs are packed with 4-byte padding. +.Ss CONTROL MESSAGES +A number of generic control messages are reserved in each family. +.Pp +.Dv NLMSG_ERROR +reports the operation result if requested, optionally followed by +the metadata TLVs. +The value of +.Va nlmsg_seq +is set to its value in the original messages, while +.Va nlmsg_pid +is set to the socket pid of the original socket. +The operation result is reported via +.Vt "struct nlmsgerr": +.Bd -literal +struct nlmsgerr { + int error; /* Standard errno */ + struct nlmsghdr msg; /* Original message header */ +}; +.Ed +If the +.Dv NETLINK_CAP_ACK +socket option is not set, the remainder of the original message will follow. +If the +.Dv NETLINK_EXT_ACK +socket option is set, kernel may add a +.Dv NLMSGERR_ATTR_MSG +string TLV with the textual error description, optionally followed by the +.Dv NLMSGERR_ATTR_OFFS +TLV, indicating the offset from the message start that triggered an error. +.Pp +.Dv NLMSG_DONE +indicates the end of the message group: typically, the end of the dump. +It contains a single +.Vt int +field, describing the dump result as a standard errno value. +.Sh SOCKET OPTIONS +Netlink supports a number of custom socket options, which can be set with +.Xr setsockopt 2 +with the +.Dv SOL_NETLINK +.Fa level : +.Bl -tag -width indent +.It Dv NETLINK_ADD_MEMBERSHIP +Subscribes to the notifications for the specific group (int). +.It Dv NETLINK_DROP_MEMBERSHIP +Unsubscribes from the notifications for the specific group (int). +.It Dv NETLINK_LIST_MEMBERSHIPS +Lists the memberships as a bitmask. +.It Dv NETLINK_CAP_ACK +Instructs the kernel to send the original message header in the reply +without the message body. +.It Dv NETLINK_EXT_ACK +Acknowledges ability to receive additional TLVs in the ACK message. +.El +.Pp +Additionally, netlink overrides the following socket options from the +.Dv SOL_SOCKET +.Fa level : +.Bl -tag -width indent +.It Dv SO_RCVBUF +Sets the maximum size of the socket receive buffer. +If the caller has +.Dv PRIV_NET_ROUTE +permission, the value can exceed the currently-set +.Va kern.ipc.maxsockbuf +value. +.El +.Sh SYSCTL VARIABLES +A set of +.Xr sysctl 8 +variables is available to tweak run-time parameters: +.Bl -tag -width indent +.It Va net.netlink.sendspace +Default send buffer for the netlink socket. +Note that the socket sendspace has to be at least as long as the longest +message that can be transmitted via this socket. +.El +.Bl -tag -width indent +.It Va net.netlink.recvspace +Default receive buffer for the netlink socket. +Note that the socket recvspace has to be least as long as the longest +message that can be received from this socket. +.El +.Sh DEBUGGING +Netlink implements per-functional-unit debugging, with different severities +controllable via the +.Va net.netlink.debug +branch. +These messages are logged in the kernel message buffer and can be seen in +.Xr dmesg 8 +. +The following severity levels are defined: +.Bl -tag -width indent +.It Dv LOG_DEBUG(7) +Rare events or per-socket errors are reported here. +This is the default level, not impacting production performance. +.It Dv LOG_DEBUG2(8) +Socket events such as groups memberships, privilege checks, commands and dumps +are logged. +This level does not incur significant performance overhead. +.It Dv LOG_DEBUG9(9) +All socket events, each dumped or modified entities are logged. +Turning it on may result in significant performance overhead. +.El +.Sh ERRORS +Netlink reports operation results, including errors and error metadata, by +sending a +.Dv NLMSG_ERROR +message for each request message. +The following errors can be returned: +.Bl -tag -width Er +.It Bq Er EPERM +when the current privileges are insufficient to perform the required operation; +.It Bo Er ENOBUFS Bc or Bo Er ENOMEM Bc +when the system runs out of memory for +an internal data structure; +.It Bq Er ENOTSUP +when the requested command is not supported by the family or +the family is not supported; +.It Bq Er EINVAL +when some necessary TLVs are missing or invalid, detailed info +may be provided in NLMSGERR_ATTR_MSG and NLMSGERR_ATTR_OFFS TLVs; +.It Bq Er ENOENT +when trying to delete a non-existent object. +.Pp +Additionally, a socket operation itself may fail with one of the errors +specified in +.Xr socket 2 +, +.Xr recv 2 +or +.Xr send 2 +. +.El +.Sh SEE ALSO +.Xr genetrlink 4 , +.Xr rtnetlink 4 +.Rs +.%A "J. Salim" +.%A "H. Khosravi" +.%A "A. Kleen" +.%A "A. Kuznetsov" +.%T "Linux Netlink as an IP Services Protocol" +.%O "RFC 3549" +.Re +.Sh HISTORY +The netlink protocol appeared in +.Fx 14.0 . +.Sh AUTHORS +The netlink was implemented by +.An -nosplit +.An Alexander Chernikov Aq Mt melifaro@FreeBSD.org . +It was derived from the Google Summer of Code 2021 project by +.An Ng Peng Nam Sean . diff --git a/share/man/man4/rtnetlink.4 b/share/man/man4/rtnetlink.4 new file mode 100644 index 000000000000..9f20671719f0 --- /dev/null +++ b/share/man/man4/rtnetlink.4 @@ -0,0 +1,519 @@ +.\" +.\" Copyright (C) 2022 Alexander Chernikov . +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd November 1, 2022 +.Dt RTNETLINK 4 +.Os +.Sh NAME +.Nm RTNetlink +.Nd Network configuration-specific Netlink family +.Sh SYNOPSIS +.In netlink/netlink.h +.In netlink/netlink_route.h +.Ft int +.Fn socket AF_NETLINK SOCK_DGRAM NETLINK_ROUTE +.Sh DESCRIPTION +The +.Dv NETLINK_ROUTE +family aims to be the primary configuration mechanism for all +network-related tasks. +Currently it supports configuring interfaces, interface addresses, routes, +nexthops and arp/ndp neighbors. +.Sh ROUTES +All route configuration messages share the common header: +.Bd -literal +struct rtmsg { + unsigned char rtm_family; /* address family */ + unsigned char rtm_dst_len; /* Prefix length */ + unsigned char rtm_src_len; /* Deprecated, set to 0 */ + unsigned char rtm_tos; /* Type of service (not used) */ + unsigned char rtm_table; /* deprecated, set to 0 */ + unsigned char rtm_protocol; /* Routing protocol id (RTPROT_) */ + unsigned char rtm_scope; /* Route distance (RT_SCOPE_) */ + unsigned char rtm_type; /* Route type (RTN_) */ + unsigned rtm_flags; /* Route flags (not supported) */ +}; +.Ed +.Pp +The +.Va rtm_family +specifies the route family to be operated on. +Currently, +.Dv AF_INET6 +and +.Dv AF_INET +are the only supported families. +The route prefix length is stored in +.Va rtm_dst_len +. +The caller should set the originator identity (one of the +.Dv RTPROT_ +values) in +.Va rtm_protocol +. +It is useful for users and for the application itself, allowing for easy +identification of self-originated routes. +The route scope has to be set via +.Va rtm_scope +field. +The supported values are: +.Bd -literal -offset indent -compact +RT_SCOPE_UNIVERSE Global scope +RT_SCOPE_LINK Link scope +.Ed +.Pp +Route type needs to be set. +The defined values are: +.Bd -literal -offset indent -compact +RTN_UNICAST Unicast route +RTN_MULTICAST Multicast route +RTN_BLACKHOLE Drops traffic towards destination +RTN_PROHIBIT Drops traffic and sends reject +.Ed +.Pp +The following messages are supported: +.Ss RTM_NEWROUTE +Adds a new route. +All NL flags are supported. +Extending a multipath route requires NLM_F_APPEND flag. +.Ss RTM_DELROUTE +Tries to delete a route. +The route is specified using a combination of +.Dv RTA_DST +TLV and +.Va rtm_dst_len . +.Ss RTM_GETROUTE +Fetches a single route or all routes in the current VNET, depending on the +.Dv NLM_F_DUMP +flag. +Each route is reported as +.Dv RTM_NEWROUTE +message. +The following filters are recognised by the kernel: +.Pp +.Bd -literal -offset indent -compact +rtm_family required family or AF_UNSPEC +RTA_TABLE fib number or RT_TABLE_UNSPEC to return all fibs +.Ed +.Ss TLVs +.Bl -tag -width indent +.It Dv RTA_DST +(binary) IPv4/IPv6 address, depending on the +.Va rtm_family . +.It Dv RTA_OIF +(uint32_t) transmit interface index. +.It Dv RTA_GATEWAY +(binary) IPv4/IPv6 gateway address, depending on the +.Va rtm_family . +.It Dv RTA_METRICS +(nested) Container attribute, listing route properties. +The only supported sub-attribute is +.Dv RTAX_MTU , which stores path MTU as uint32_t. +.It Dv RTA_MULTIPATH +This attribute contains multipath route nexthops with their weights. +These nexthops are represented as a sequence of +.Va rtnexthop +structures, each followed by +.Dv RTA_GATEWAY +or +.Dv RTA_VIA +attributes. +.Bd -literal +struct rtnexthop { + unsigned short rtnh_len; + unsigned char rtnh_flags; + unsigned char rtnh_hops; /* nexthop weight */ + int rtnh_ifindex; +}; +.Ed +.Pp +The +.Va rtnh_len +field specifies the total nexthop info length, including both +.Va struct rtnexthop +and the following TLVs. +The +.Va rtnh_hops +field stores relative nexthop weight, used for load balancing between group +members. +The +.Va rtnh_ifindex +field contains the index of the transmit interface. +.Pp +The following TLVs can follow the structure: +.Bd -literal -offset indent -compact +RTA_GATEWAY IPv4/IPv6 nexthop address of the gateway +RTA_VIA IPv6 nexthop address for IPv4 route +RTA_KNH_ID Kernel-specific index of the nexthop +.Ed +.It Dv RTA_KNH_ID +(uint32_t) (FreeBSD-specific) Auto-allocated kernel index of the nexthop. +.It Dv RTA_RTFLAGS +(uint32_t) (FreeBSD-specific) rtsock route flags. +.It Dv RTA_TABLE +(uint32_t) Fib number of the route. +Default route table is +.Dv RT_TABLE_MAIN . +To explicitely specify "all tables" one needs to set the value to +.Dv RT_TABLE_UNSPEC . +.It Dv RTA_EXPIRES +(uint32_t) seconds till path expiration. +.It Dv RTA_NH_ID +(uint32_t) useland nexthop or nexthop group index. +.El +.Ss Groups +The following groups are defined: +.Bd -literal -offset indent -compact +RTNLGRP_IPV4_ROUTE Notifies on IPv4 route arrival/removal/change +RTNLGRP_IPV6_ROUTE Notifies on IPv6 route arrival/removal/change +.Ed +.Sh NEXTHOPS +All nexthop/nexthop group configuration messages share the common header: +.Bd -literal +struct nhmsg { + unsigned char nh_family; /* transport family */ + unsigned char nh_scope; /* ignored on RX, filled by kernel */ + unsigned char nh_protocol; /* Routing protocol that installed nh */ + unsigned char resvd; + unsigned int nh_flags; /* RTNH_F_* flags from route.h */ +}; +.Ed +The +.Va nh_family +specificies the gateway address family. +It can be different from route address family for IPv4 routes with IPv6 +nexthops. +The +.Va nh_protocol +is similar to +.Va rtm_protocol +field, which designates originator application identity. +.Pp +The following messages are supported: +.Ss RTM_NEWNEXTHOP +Creates a new nexthop or nexthop group. +.Ss RTM_DELNEXTHOP +Deletes nexthop or nexthhop group. +The required object is specified by the +.Dv RTA_NH_ID +attribute. +.Ss RTM_GETNEXTHOP +Fetches a single nexthop or all nexthops/nexthop groups, depending on the +.Dv NLM_F_DUMP +flag. +The following filters are recognised by the kernel: +.Pp +.Bd -literal -offset indent -compact +RTA_NH_ID nexthop or nexthtop group id +NHA_GROUPS match only nexthtop groups +.Ed +.Ss TLVs +.Bl -tag -width indent +.It Dv RTA_NH_ID +(uint32_t) Nexthhop index used to identify particular nexthop or nexthop group. +Should be provided by userland at the nexthtop creation time. +.It Dv NHA_GROUP +This attribute designates the nexthtop group and contains all of its nexthtops +and their relative weights. +The attribute constists of a list of +.Va nexthop_grp +structures: +.Bd -literal +struct nexthop_grp { + uint32_t id; /* nexhop userland index */ + uint8_t weight; /* weight of this nexthop */ + uint8_t resvd1; + uint16_t resvd2; +}; +.Ed +.It Dv NHA_GROUP_TYPE +(uint16_t) Nexthtop group type, set to one of the following types: +.Bd -literal -offset indent -compact +NEXTHOP_GRP_TYPE_MPATH default multipath group +.Ed +.It Dv NHA_BLACKHOLE +(flag) Marks the nexthtop as blackhole. +.It Dv NHA_OIF +(uint32_t) Transmit interface index of the nexthtop. +.It Dv NHA_GATEWAY +(binary) IPv4/IPv6 gateway address +.It Dv NHA_GROUPS +(flag) Matches nexthtop groups during dump. +.El +.Ss Groups +The following groups are defined: +.Bd -literal -offset indent -compact +RTNLGRP_NEXTHOP Notifies on nexthop/groups arrival/removal/change +.Ed +.Sh INTERFACES +All interface configuration messages share the common header: +.Bd -literal +struct ifinfomsg { + unsigned char ifi_family; /* not used, set to 0 */ + unsigned char __ifi_pad; + unsigned short ifi_type; /* ARPHRD_* */ + int ifi_index; /* Inteface index */ + unsigned ifi_flags; /* IFF_* flags */ + unsigned ifi_change; /* IFF_* change mask */ +}; +.Ed +.Ss RTM_NEWLINK +Creates a new interface. +The only mandatory TLV is +.Dv IFLA_IFNAME . +.Ss RTM_DELLINK +Deletes the interface specified by +.Dv IFLA_IFNAME . +.Ss RTM_GETLINK +Fetches a single interface or all interfaces in the current VNET, depending on the +.Dv NLM_F_DUMP +flag. +Each interface is reported as a +.Dv RTM_NEWLINK +message. +The following filters are recognised by the kernel: +.Pp +.Bd -literal -offset indent -compact +ifi_index interface index +IFLA_IFNAME interface name +IFLA_ALT_IFNAME interface name +.Ed +.Ss TLVs +.Bl -tag -width indent +.It Dv IFLA_ADDRESS +(binary) Llink-level interface address (MAC). +.It Dv IFLA_BROADCAST +(binary) (readonly) Link-level broadcast address. +.It Dv IFLA_IFNAME +(string) New interface name. +.It Dv IFLA_LINK +(uint32_t) (readonly) Interface index. +.It Dv IFLA_MASTER +(uint32_t) Parent interface index. +.It Dv IFLA_LINKINFO +(nested) Interface type-specific attributes: +.Bd -literal -offset indent -compact +IFLA_INFO_KIND (string) interface type ("vlan") +IFLA_INFO_DATA (nested) custom attributes +.Ed +The following types and attributes are supported: +.Bl -tag -width indent +.It Dv vlan +.Bd -literal -offset indent -compact +IFLA_VLAN_ID (uint16_t) 802.1Q vlan id +IFLA_VLAN_PROTOCOL (uint16_t) Protocol: ETHERTYPE_VLAN or ETHERTYPE_QINQ +.Ed +.El +.It Dv IFLA_OPERSTATE +(uint8_t) Interface operational state per RFC 2863. +Can be one of the following: +.Bd -literal -offset indent -compact +IF_OPER_UNKNOWN status can not be determined +IF_OPER_NOTPRESENT some (hardware) component not present +IF_OPER_DOWN down +IF_OPER_LOWERLAYERDOWN some lower-level interface is down +IF_OPER_TESTING in some test mode +IF_OPER_DORMANT "up" but waiting for some condition (802.1X) +IF_OPER_UP ready to pass packets +.Ed +.It Dv IFLA_STATS64 +(readonly) Consists of the following 64-bit counters structure: +.Bd -literal +struct rtnl_link_stats64 { + uint64_t rx_packets; /* total RX packets (IFCOUNTER_IPACKETS) */ + uint64_t tx_packets; /* total TX packets (IFCOUNTER_OPACKETS) */ + uint64_t rx_bytes; /* total RX bytes (IFCOUNTER_IBYTES) */ + uint64_t tx_bytes; /* total TX bytes (IFCOUNTER_OBYTES) */ + uint64_t rx_errors; /* RX errors (IFCOUNTER_IERRORS) */ + uint64_t tx_errors; /* RX errors (IFCOUNTER_OERRORS) */ + uint64_t rx_dropped; /* RX drop (no space in ring/no bufs) (IFCOUNTER_IQDROPS) */ + uint64_t tx_dropped; /* TX drop (IFCOUNTER_OQDROPS) */ + uint64_t multicast; /* RX multicast packets (IFCOUNTER_IMCASTS) */ + uint64_t collisions; /* not supported */ + uint64_t rx_length_errors; /* not supported */ + uint64_t rx_over_errors; /* not supported */ + uint64_t rx_crc_errors; /* not supported */ + uint64_t rx_frame_errors; /* not supported */ + uint64_t rx_fifo_errors; /* not supported */ + uint64_t rx_missed_errors; /* not supported */ + uint64_t tx_aborted_errors; /* not supported */ + uint64_t tx_carrier_errors; /* not supported */ + uint64_t tx_fifo_errors; /* not supported */ + uint64_t tx_heartbeat_errors; /* not supported */ + uint64_t tx_window_errors; /* not supported */ + uint64_t rx_compressed; /* not supported */ + uint64_t tx_compressed; /* not supported */ + uint64_t rx_nohandler; /* dropped due to no proto handler (IFCOUNTER_NOPROTO) */ +}; +.Ed +.El +.Ss Groups +The following groups are defined: +.Bd -literal -offset indent -compact +RTNLGRP_LINK Notifies on interface arrival/removal/change +.Ed +.Sh INTERFACE ADDRESSES +All interface address configuration messages share the common header: +.Bd -literal +struct ifaddrmsg { + uint8_t ifa_family; /* Address family */ + uint8_t ifa_prefixlen; /* Prefix length */ + uint8_t ifa_flags; /* Address-specific flags */ + uint8_t ifa_scope; /* Address scope */ + uint32_t ifa_index; /* Link ifindex */ +}; +.Ed +.Pp +The +.Va ifa_family +specifies the address family of the interface address. +The +.Va ifa_prefixlen +specifies the prefix length if applicable for the address family. +The +.Va ifa_index +specifies the interface index of the target interface. +.Ss RTM_NEWADDR +Not supported +.Ss RTM_DELADDR +Not supported +.Ss RTM_GETADDR +.Ss TLVs +.Bl -tag -width indent +.It Dv IFA_ADDRESS +(binary) masked interface address or destination address for p2p interfaces. +.It Dv IFA_LOCAL +(binary) local interface address +.It Dv IFA_BROADCAST +(binary) broacast interface address +.El +.Ss Groups +The following groups are defined: +.Bd -literal -offset indent -compact +RTNLGRP_IPV4_IFADDR Notifies on IPv4 ifaddr arrival/removal/change +RTNLGRP_IPV6_IFADDR Notifies on IPv6 ifaddr arrival/removal/change +.Ed +.Sh NEIGHBORS +All neighbor configuration messages share the common header: +.Bd -literal +struct ndmsg { + uint8_t ndm_family; + uint8_t ndm_pad1; + uint16_t ndm_pad2; + int32_t ndm_ifindex; + uint16_t ndm_state; + uint8_t ndm_flags; + uint8_t ndm_type; +}; +.Ed +.Pp +The +.Va ndm_family +field specifies the address family (IPv4 or IPv6) of the neighbor. +The +.Va ndm_ifindex +specifies the interface to operate on. +The +.Va ndm_state +represents the entry state according to the neighbor model. +The state can be one of the following: +.Bd -literal -offset indent -compact +NUD_INCOMPLETE No lladdr, address resolution in progress +NUD_REACHABLE reachable & recently resolved +NUD_STALE has lladdr but it's stale +NUD_DELAY has lladdr, is stale, probes delayed +NUD_PROBE has lladdr, is stale, probes sent +NUD_FAILED unused +.Ed +.Pp *** 68 LINES SKIPPED ***