Re: git: 7e5bf68495cc - main - netlink: add netlink support
Date: Thu, 09 Mar 2023 15:38:14 UTC
On 1 Oct 2022, at 16:19, Alexander V. Chernikov wrote: > The branch main has been updated by melifaro: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=7e5bf68495cc0a8c9793a338a8a02009a7f6dbb6 > > commit 7e5bf68495cc0a8c9793a338a8a02009a7f6dbb6 > Author: Alexander V. Chernikov <melifaro@FreeBSD.org> > AuthorDate: 2022-01-20 21:39:21 +0000 > Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> > CommitDate: 2022-10-01 14:15:35 +0000 > > netlink: add netlink support > > Netlinks is a communication protocol currently used in Linux > kernel to modify, > read and subscribe for nearly all networking state. Interfaces, > addresses, routes, > firewall, fibs, vnets, etc are controlled via netlink. > It is async, TLV-based protocol, providing 1-1 and 1-many > communications. > > The current implementation supports the subset of NETLINK_ROUTE > family. To be more specific, the following is supported: > * Dumps: > - routes > - nexthops / nexthop groups > - interfaces > - interface addresses > - neighbors (arp/ndp) > * Notifications: > - interface arrival/departure > - interface address arrival/departure > - route addition/deletion > * Modifications: > - adding/deleting routes > - adding/deleting nexthops/nexthops groups > - adding/deleting neghbors > - adding/deleting interfaces (basic support only) > * Rtsock interaction > - route events are bridged both ways > > The implementation also supports the NETLINK_GENERIC family > framework. > > Implementation notes: > Netlink is implemented via loadable/unloadable kernel module, > not touching many kernel parts. > Each netlink socket uses dedicated taskqueue to support async > operations > that can sleep, such as interface creation. All message > processing is > performed within these taskqueues. > > Compatibility: > Most of the Netlink data models specified above maps to FreeBSD > concepts > nicely. Unmodified ip(8) binary correctly works with > interfaces, addresses, routes, nexthops and nexthop groups. Some > software such as net/bird require header-only modifications to > compile > and work with FreeBSD netlink. > > Reviewed by: imp > Differential Revision: https://reviews.freebsd.org/D36002 > MFC after: 2 months > --- > etc/mtree/BSD.include.dist | 4 + > sys/modules/Makefile | 1 + > sys/modules/netlink/Makefile | 17 + > sys/net/route.c | 11 + > sys/net/route/route_ctl.h | 7 + > sys/net/rtsock.c | 42 ++ > sys/netlink/netlink.h | 257 +++++++++ > sys/netlink/netlink_ctl.h | 102 ++++ > sys/netlink/netlink_debug.h | 82 +++ > sys/netlink/netlink_domain.c | 689 +++++++++++++++++++++++ > sys/netlink/netlink_generic.c | 472 ++++++++++++++++ > sys/netlink/netlink_generic.h | 112 ++++ > sys/netlink/netlink_io.c | 528 ++++++++++++++++++ > sys/netlink/netlink_linux.h | 54 ++ > sys/netlink/netlink_message_parser.c | 472 ++++++++++++++++ > sys/netlink/netlink_message_parser.h | 270 +++++++++ > sys/netlink/netlink_message_writer.c | 686 +++++++++++++++++++++++ > sys/netlink/netlink_message_writer.h | 250 +++++++++ > sys/netlink/netlink_module.c | 228 ++++++++ > sys/netlink/netlink_route.c | 135 +++++ > sys/netlink/netlink_route.h | 43 ++ > sys/netlink/netlink_var.h | 142 +++++ > sys/netlink/route/common.h | 213 ++++++++ > sys/netlink/route/iface.c | 857 > +++++++++++++++++++++++++++++ > sys/netlink/route/iface_drivers.c | 165 ++++++ > sys/netlink/route/ifaddrs.h | 90 +++ > sys/netlink/route/interface.h | 245 +++++++++ > sys/netlink/route/neigh.c | 571 +++++++++++++++++++ > sys/netlink/route/neigh.h | 105 ++++ > sys/netlink/route/nexthop.c | 1000 > ++++++++++++++++++++++++++++++++++ > sys/netlink/route/nexthop.h | 102 ++++ > sys/netlink/route/route.c | 972 > +++++++++++++++++++++++++++++++++ > sys/netlink/route/route.h | 366 +++++++++++++ > sys/netlink/route/route_var.h | 101 ++++ > 34 files changed, 9391 insertions(+) > > diff --git a/sys/netlink/netlink.h b/sys/netlink/netlink.h > new file mode 100644 > index 000000000000..6a68dcec1382 > --- /dev/null > +++ b/sys/netlink/netlink.h > @@ -0,0 +1,257 @@ > +/*- > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD > + * > + * Copyright (c) 2021 Ng Peng Nam Sean > + * Copyright (c) 2022 Alexander V. Chernikov <melifaro@FreeBSD.org> > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above > copyright > + * notice, this list of conditions and the following disclaimer in > the > + * documentation and/or other materials provided with the > distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' > AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE > LIABLE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE > GOODS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > CONTRACT, STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN > ANY WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > POSSIBILITY OF > + * SUCH DAMAGE. > + * > + * Copyright (C) The Internet Society (2003). All Rights Reserved. > + * > + * This document and translations of it may be copied and furnished > to > + * others, and derivative works that comment on or otherwise explain > it > + * or assist in its implementation may be prepared, copied, published > + * and distributed, in whole or in part, without restriction of any > + * kind, provided that the above copyright notice and this paragraph > are > + * included on all such copies and derivative works. However, this > + * document itself may not be modified in any way, such as by > removing > + * the copyright notice or references to the Internet Society or > other > + * Internet organizations, except as needed for the purpose of > + * developing Internet standards in which case the procedures for > + * copyrights defined in the Internet Standards process must be > + * followed, or as required to translate it into languages other than > + * English. > + * > + * The limited permissions granted above are perpetual and will not > be > + * revoked by the Internet Society or its successors or assignees. > + * > + * This document and the information contained herein is provided on > an > + * "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET > ENGINEERING > + * TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING > + * BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION > + * HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. > + > + */ > + > +/* > + * This file contains structures and constants for RFC 3549 (Netlink) > + * protocol. Some values have been taken from Linux implementation. > + */ > + > +#ifndef _NETLINK_NETLINK_H_ > +#define _NETLINK_NETLINK_H_ > + > +#include <sys/types.h> > +#include <sys/socket.h> > + > +struct sockaddr_nl { > + uint8_t nl_len; /* sizeof(sockaddr_nl) */ > + sa_family_t nl_family; /* netlink family */ > + uint16_t nl_pad; /* reserved, set to 0 */ > + uint32_t nl_pid; /* desired port ID, 0 for auto-select */ > + uint32_t nl_groups; /* multicast groups mask to bind to */ > +}; > + > +#define SOL_NETLINK 270 > + > +/* Netlink socket options */ > +#define NETLINK_ADD_MEMBERSHIP 1 /* Subscribe for the specified > group notifications */ > +#define NETLINK_DROP_MEMBERSHIP 2 /* Unsubscribe from the specified > group */ > +#define NETLINK_PKTINFO 3 /* XXX: not supported */ > +#define NETLINK_BROADCAST_ERROR 4 /* XXX: not supported */ > +#define NETLINK_NO_ENOBUFS 5 /* XXX: not supported */ > +#define NETLINK_RX_RING 6 /* XXX: not supported */ > +#define NETLINK_TX_RING 7 /* XXX: not supported */ > +#define NETLINK_LISTEN_ALL_NSID 8 /* XXX: not supported */ > + > +#define NETLINK_LIST_MEMBERSHIPS 9 > +#define NETLINK_CAP_ACK 10 /* Send only original message header in > the reply */ > +#define NETLINK_EXT_ACK 11 /* Ack support for receiving additional > TLVs in ack */ > +#define NETLINK_GET_STRICT_CHK 12 /* Strict header checking */ > + > + > +/* > + * RFC 3549, 2.3.2 Netlink Message Header > + */ > +struct nlmsghdr { > + uint32_t nlmsg_len; /* Length of message including header */ > + uint16_t nlmsg_type; /* Message type identifier */ > + uint16_t nlmsg_flags; /* Flags (NLM_F_) */ > + uint32_t nlmsg_seq; /* Sequence number */ > + uint32_t nlmsg_pid; /* Sending process port ID */ > +}; > + > +/* > + * RFC 3549, 2.3.2 standard flag bits (nlmsg_flags) > + */ > +#define NLM_F_REQUEST 0x01 /* Indicateds request to kernel */ > +#define NLM_F_MULTI 0x02 /* Message is part of a group terminated by > NLMSG_DONE msg */ > +#define NLM_F_ACK 0x04 /* Reply with ack message containing > resulting error code */ > +#define NLM_F_ECHO 0x08 /* (not supported) Echo this request back */ > +#define NLM_F_DUMP_INTR 0x10 /* Dump was inconsistent due to > sequence change */ > +#define NLM_F_DUMP_FILTERED 0x20 /* Dump was filtered as requested */ > + > +/* > + * RFC 3549, 2.3.2 Additional flag bits for GET requests > + */ > +#define NLM_F_ROOT 0x100 /* Return the complete table */ > +#define NLM_F_MATCH 0x200 /* Return all entries matching criteria */ > +#define NLM_F_ATOMIC 0x400 /* Return an atomic snapshot (ignored) */ > +#define NLM_F_DUMP (NLM_F_ROOT | NLM_F_MATCH) > + > +/* > + * RFC 3549, 2.3.2 Additional flag bits for NEW requests > + */ > +#define NLM_F_REPLACE 0x100 /* Replace existing matching config > object */ > +#define NLM_F_EXCL 0x200 /* Don't replace the object if exists */ > +#define NLM_F_CREATE 0x400 /* Create if it does not exist */ > +#define NLM_F_APPEND 0x800 /* Add to end of list */ > + > +/* Modifiers to DELETE requests */ > +#define NLM_F_NONREC 0x100 /* Do not delete recursively */ > + > +/* Flags for ACK message */ > +#define NLM_F_CAPPED 0x100 /* request was capped */ > +#define NLM_F_ACK_TLVS 0x200 /* extended ACK TVLs were included */ > + > +/* > + * RFC 3549, 2.3.2 standard message types (nlmsg_type). > + */ > +#define NLMSG_NOOP 0x1 /* Message is ignored. */ > +#define NLMSG_ERROR 0x2 /* reply error code reporting */ > +#define NLMSG_DONE 0x3 /* Message terminates a multipart message. */ > +#define NLMSG_OVERRUN 0x4 /* overrun detected, data is lost */ > + > +#define NLMSG_MIN_TYPE 0x10 /* < 0x10: reserved control messages */ > + > +/* > + * Defition of numbers assigned to the netlink subsystems. > + */ > +#define NETLINK_ROUTE 0 /* Routing/device hook */ > +#define NETLINK_UNUSED 1 /* not supported */ > +#define NETLINK_USERSOCK 2 /* not supported */ > +#define NETLINK_FIREWALL 3 /* not supported */ > +#define NETLINK_SOCK_DIAG 4 /* not supported */ > +#define NETLINK_NFLOG 5 /* not supported */ > +#define NETLINK_XFRM 6 /* (not supported) PF_SETKEY */ > +#define NETLINK_SELINUX 7 /* not supported */ > +#define NETLINK_ISCSI 8 /* not supported */ > +#define NETLINK_AUDIT 9 /* not supported */ > +#define NETLINK_FIB_LOOKUP 10 /* not supported */ > +#define NETLINK_CONNECTOR 11 /* not supported */ > +#define NETLINK_NETFILTER 12 /* not supported */ > +#define NETLINK_IP6_FW 13 /* not supported */ > +#define NETLINK_DNRTMSG 14 /* not supported */ > +#define NETLINK_KOBJECT_UEVENT 15 /* not supported */ > +#define NETLINK_GENERIC 16 /* Generic netlink (dynamic families) */ > + So, really fun thing here, we also have `#define NETLINK_GENERIC 0` in sys/net/if_mib.h. (And that’s exposed to userspace, and used there, so we can’t just change that.) Which leads to much fun if we decided to do something like including the netlink_generic header in other headers, so we can define messages that contain the genlmsghdr struct. I ran into that experimenting with netlink for carp(4). I think I can work around it by adding a separate ip_carp_nl.h header for the netlink stuff, but sooner or later this is going to bite us. Kristof