svn commit: r331347 - in head: etc/mtree include sys/conf sys/dev/tcp_log sys/kern sys/netinet usr.bin/netstat
Ruslan Bukin
ruslan.bukin at cl.cam.ac.uk
Thu Mar 22 18:23:15 UTC 2018
Look at these
https://ci.freebsd.org/job/FreeBSD-head-mips-build/lastBuild/console
https://ci.freebsd.org/job/FreeBSD-head-powerpc-build/lastBuild/console
Example
make -j5 TARGET=mips TARGET_ARCH=mipsel kernel-toolchain
make -j5 TARGET=mips TARGET_ARCH=mipsel KERNCONF=CANNA buildkernel
Ruslan
On Thu, Mar 22, 2018 at 03:39:23PM +0000, Jonathan Looney wrote:
> A tinderbox build didn't complain about atomic_fetchadd_64, so I assume it
> is OK.
> Yes, this can be made optional, if there is a need for that.
> Jonathan
> On Thu, Mar 22, 2018 at 2:22 PM, Ruslan Bukin
> <[1]ruslan.bukin at cl.cam.ac.uk> wrote:
>
> Also can this be pluggable ?
> It looks like it is optional device which means it can free up some
> space in embedded environment when unused
> Ruslan
> On Thu, Mar 22, 2018 at 02:16:06PM +0000, Ruslan Bukin wrote:
> > We don't have atomic_fetchadd_64 for mips32 I think
> >
> > Ruslan
> >
> > On Thu, Mar 22, 2018 at 09:40:08AM +0000, Jonathan T. Looney wrote:
> > > Author: jtl
> > > Date: Thu Mar 22 09:40:08 2018
> > > New Revision: 331347
> > > URL: [2]https://svnweb.freebsd.org/changeset/base/331347
> > >
> > > Log:
> > > Add the "TCP Blackbox Recorder" which we discussed at the
> developer
> > > summits at BSDCan and BSDCam in 2017.
> > >
> > > The TCP Blackbox Recorder allows you to capture events on a TCP
> connection
> > > in a ring buffer. It stores metadata with the event. It
> optionally stores
> > > the TCP header associated with an event (if the event is
> associated with a
> > > packet) and also optionally stores information on the sockets.
> > >
> > > It supports setting a log ID on a TCP connection and using this
> to correlate
> > > multiple connections that share a common log ID.
> > >
> > > You can log connections in different modes. If you are doing a
> coordinated
> > > test with a particular connection, you may tell the system to
> put it in
> > > mode 4 (continuous dump). Or, if you just want to monitor for
> errors, you
> > > can put it in mode 1 (ring buffer) and dump all the ring buffers
> associated
> > > with the connection ID when we receive an error signal for that
> connection
> > > ID. You can set a default mode that will be applied to a
> particular ratio
> > > of incoming connections. You can also manually set a mode using
> a socket
> > > option.
> > >
> > > This commit includes only basic probes. rrs@ has added quite an
> abundance
> > > of probes in his TCP development work. He plans to commit those
> soon.
> > >
> > > There are user-space programs which we plan to commit as ports.
> These read
> > > the data from the log device and output pcapng files, and then
> let you
> > > analyze the data (and metadata) in the pcapng files.
> > >
> > > Reviewed by: gnn (previous version)
> > > Obtained from: Netflix, Inc.
> > > Relnotes: yes
> > > Differential Revision:
> [3]https://reviews.freebsd.org/D11085
> > >
> > > Added:
> > > head/sys/dev/tcp_log/
> > > head/sys/dev/tcp_log/tcp_log_dev.c (contents, props changed)
> > > head/sys/dev/tcp_log/tcp_log_dev.h (contents, props changed)
> > > head/sys/netinet/tcp_log_buf.c (contents, props changed)
> > > head/sys/netinet/tcp_log_buf.h (contents, props changed)
> > > Modified:
> > > head/etc/mtree/BSD.include.dist
> > > head/include/Makefile
> > > head/sys/conf/files
> > > head/sys/kern/subr_witness.c
> > > head/sys/netinet/tcp.h
> > > head/sys/netinet/tcp_input.c
> > > head/sys/netinet/tcp_output.c
> > > head/sys/netinet/tcp_subr.c
> > > head/sys/netinet/tcp_timer.c
> > > head/sys/netinet/tcp_usrreq.c
> > > head/sys/netinet/tcp_var.h
> > > head/usr.bin/netstat/inet.c
> > > head/usr.bin/netstat/main.c
> > > head/usr.bin/netstat/netstat.1
> > > head/usr.bin/netstat/netstat.h
> > >
> > > Modified: head/etc/mtree/BSD.include.dist
> > >
> ==============================================================================
> > > --- head/etc/mtree/BSD.include.dist Thu Mar 22 08:32:39 2018
> (r331346)
> > > +++ head/etc/mtree/BSD.include.dist Thu Mar 22 09:40:08 2018
> (r331347)
> > > @@ -158,6 +158,8 @@
> > > ..
> > > speaker
> > > ..
> > > + tcp_log
> > > + ..
> > > usb
> > > ..
> > > vkbd
> > >
> > > Modified: head/include/Makefile
> > >
> ==============================================================================
> > > --- head/include/Makefile Thu Mar 22 08:32:39 2018
> (r331346)
> > > +++ head/include/Makefile Thu Mar 22 09:40:08 2018
> (r331347)
> > > @@ -47,7 +47,7 @@ LSUBDIRS= cam/ata cam/mmc cam/nvme cam/scsi \
> > > dev/hwpmc dev/hyperv \
> > > dev/ic dev/iicbus dev/io dev/lmc dev/mfi dev/mmc dev/nvme \
> > > dev/ofw dev/pbio dev/pci ${_dev_powermac_nvram} dev/ppbus
> dev/smbus \
> > > - dev/speaker dev/vkbd dev/wi \
> > > + dev/speaker dev/tcp_log dev/vkbd dev/wi \
> > > fs/devfs fs/fdescfs fs/msdosfs fs/nandfs fs/nfs fs/nullfs \
> > > fs/procfs fs/smbfs fs/udf fs/unionfs \
> > > geom/cache geom/concat geom/eli geom/gate geom/journal
> geom/label \
> > >
> > > Modified: head/sys/conf/files
> > >
> ==============================================================================
> > > --- head/sys/conf/files Thu Mar 22 08:32:39 2018
> (r331346)
> > > +++ head/sys/conf/files Thu Mar 22 09:40:08 2018
> (r331347)
> > > @@ -3161,6 +3161,7 @@ dev/syscons/star/star_saver.c optional
> star_saver
> > > dev/syscons/syscons.c optional sc
> > > dev/syscons/sysmouse.c optional sc
> > > dev/syscons/warp/warp_saver.c optional warp_saver
> > > +dev/tcp_log/tcp_log_dev.c optional inet | inet6
> > > dev/tdfx/tdfx_linux.c optional tdfx_linux tdfx
> compat_linux
> > > dev/tdfx/tdfx_pci.c optional tdfx pci
> > > dev/ti/if_ti.c optional ti pci
> > > @@ -4309,6 +4310,7 @@ netinet/tcp_debug.c optional
> tcpdebug
> > > netinet/tcp_fastopen.c optional inet
> tcp_rfc7413 | inet6 tcp_rfc7413
> > > netinet/tcp_hostcache.c optional inet | inet6
> > > netinet/tcp_input.c optional inet | inet6
> > > +netinet/tcp_log_buf.c optional inet | inet6
> > > netinet/tcp_lro.c optional inet | inet6
> > > netinet/tcp_output.c optional inet | inet6
> > > netinet/tcp_offload.c optional tcp_offload
> inet | tcp_offload inet6
> > >
> > > Added: head/sys/dev/tcp_log/tcp_log_dev.c
> > >
> ==============================================================================
> > > --- /dev/null 00:00:00 1970 (empty, because file is
> newly added)
> > > +++ head/sys/dev/tcp_log/tcp_log_dev.c Thu Mar 22 09:40:08
> 2018 (r331347)
> > > @@ -0,0 +1,521 @@
> > > +/*-
> > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
> > > + *
> > > + * Copyright (c) 2016-2017
> > > + * Netflix Inc. All rights reserved.
> > > + *
> > > + * Redistribution and use in source and binary forms, with or
> without
> > > + * modification, are permitted provided that the following
> conditions
> > > + * are met:
> > > + * 1. Redistributions of source code must retain the above
> copyright
> > > + * notice, this list of conditions and the following
> disclaimer.
> > > + * 2. Redistributions in binary form must reproduce the above
> copyright
> > > + * notice, this list of conditions and the following
> disclaimer in the
> > > + * documentation and/or other materials provided with the
> distribution.
> > > + *
> > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS
> IS'' AND
> > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> TO, THE
> > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> PARTICULAR PURPOSE
> > > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
> BE LIABLE
> > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> SUBSTITUTE GOODS
> > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION)
> > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> CONTRACT, STRICT
> > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
> IN ANY WAY
> > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> POSSIBILITY OF
> > > + * SUCH DAMAGE.
> > > + *
> > > + */
> > > +
> > > +#include <sys/cdefs.h>
> > > +__FBSDID("$FreeBSD$");
> > > +
> > > +#include <sys/param.h>
> > > +#include <sys/conf.h>
> > > +#include <sys/fcntl.h>
> > > +#include <sys/filio.h>
> > > +#include <sys/kernel.h>
> > > +#include <sys/lock.h>
> > > +#include <sys/malloc.h>
> > > +#include <sys/module.h>
> > > +#include <sys/poll.h>
> > > +#include <sys/queue.h>
> > > +#include <sys/refcount.h>
> > > +#include <sys/mutex.h>
> > > +#include <sys/selinfo.h>
> > > +#include <sys/socket.h>
> > > +#include <sys/socketvar.h>
> > > +#include <sys/sysctl.h>
> > > +#include <sys/tree.h>
> > > +#include <sys/uio.h>
> > > +#include <machine/atomic.h>
> > > +#include <sys/counter.h>
> > > +
> > > +#include <dev/tcp_log/tcp_log_dev.h>
> > > +
> > > +#ifdef TCPLOG_DEBUG_COUNTERS
> > > +extern counter_u64_t tcp_log_que_read;
> > > +extern counter_u64_t tcp_log_que_freed;
> > > +#endif
> > > +
> > > +static struct cdev *tcp_log_dev;
> > > +static struct selinfo tcp_log_sel;
> > > +
> > > +static struct log_queueh tcp_log_dev_queue_head =
> STAILQ_HEAD_INITIALIZER(tcp_log_dev_queue_head);
> > > +static struct log_infoh tcp_log_dev_reader_head =
> STAILQ_HEAD_INITIALIZER(tcp_log_dev_reader_head);
> > > +
> > > +MALLOC_DEFINE(M_TCPLOGDEV, "tcp_log_dev", "TCP log device data
> structures");
> > > +
> > > +static int tcp_log_dev_listeners = 0;
> > > +
> > > +static struct mtx tcp_log_dev_queue_lock;
> > > +
> > > +#define TCP_LOG_DEV_QUEUE_LOCK()
> mtx_lock(&tcp_log_dev_queue_lock)
> > > +#define TCP_LOG_DEV_QUEUE_UNLOCK()
> mtx_unlock(&tcp_log_dev_queue_lock)
> > > +#define TCP_LOG_DEV_QUEUE_LOCK_ASSERT()
> mtx_assert(&tcp_log_dev_queue_lock, MA_OWNED)
> > > +#define TCP_LOG_DEV_QUEUE_UNLOCK_ASSERT()
> mtx_assert(&tcp_log_dev_queue_lock, MA_NOTOWNED)
> > > +#define TCP_LOG_DEV_QUEUE_REF(tldq)
> refcount_acquire(&((tldq)->tldq_refcnt))
> > > +#define TCP_LOG_DEV_QUEUE_UNREF(tldq)
> refcount_release(&((tldq)->tldq_refcnt))
> > > +
> > > +static void tcp_log_dev_clear_refcount(struct
> tcp_log_dev_queue *entry);
> > > +static void tcp_log_dev_clear_cdevpriv(void *data);
> > > +static int tcp_log_dev_open(struct cdev *dev __unused, int flags,
> > > + int devtype __unused, struct thread *td __unused);
> > > +static int tcp_log_dev_write(struct cdev *dev __unused,
> > > + struct uio *uio __unused, int flags __unused);
> > > +static int tcp_log_dev_read(struct cdev *dev __unused, struct uio
> *uio,
> > > + int flags __unused);
> > > +static int tcp_log_dev_ioctl(struct cdev *dev __unused, u_long cmd,
> > > + caddr_t data, int fflag __unused, struct thread *td
> __unused);
> > > +static int tcp_log_dev_poll(struct cdev *dev __unused, int events,
> > > + struct thread *td);
> > > +
> > > +
> > > +enum tcp_log_dev_queue_lock_state {
> > > + QUEUE_UNLOCKED = 0,
> > > + QUEUE_LOCKED,
> > > +};
> > > +
> > > +static struct cdevsw tcp_log_cdevsw = {
> > > + .d_version = D_VERSION,
> > > + .d_read = tcp_log_dev_read,
> > > + .d_open = tcp_log_dev_open,
> > > + .d_write = tcp_log_dev_write,
> > > + .d_poll = tcp_log_dev_poll,
> > > + .d_ioctl = tcp_log_dev_ioctl,
> > > +#ifdef NOTYET
> > > + .d_mmap = tcp_log_dev_mmap,
> > > +#endif
> > > + .d_name = "tcp_log",
> > > +};
> > > +
> > > +static __inline void
> > > +tcp_log_dev_queue_validate_lock(int lockstate)
> > > +{
> > > +
> > > +#ifdef INVARIANTS
> > > + switch (lockstate) {
> > > + case QUEUE_LOCKED:
> > > + TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
> > > + break;
> > > + case QUEUE_UNLOCKED:
> > > + TCP_LOG_DEV_QUEUE_UNLOCK_ASSERT();
> > > + break;
> > > + default:
> > > + kassert_panic("%s:%d: unknown queue lock state",
> __func__,
> > > + __LINE__);
> > > + }
> > > +#endif
> > > +}
> > > +
> > > +/*
> > > + * Clear the refcount. If appropriate, it will remove the entry
> from the
> > > + * queue and call the destructor.
> > > + *
> > > + * This must be called with the queue lock held.
> > > + */
> > > +static void
> > > +tcp_log_dev_clear_refcount(struct tcp_log_dev_queue *entry)
> > > +{
> > > +
> > > + KASSERT(entry != NULL, ("%s: called with NULL entry",
> __func__));
> > > +
> > > + TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
> > > +
> > > + if (TCP_LOG_DEV_QUEUE_UNREF(entry)) {
> > > +#ifdef TCPLOG_DEBUG_COUNTERS
> > > + counter_u64_add(tcp_log_que_freed, 1);
> > > +#endif
> > > + /* Remove the entry from the queue and call the
> destructor. */
> > > + STAILQ_REMOVE(&tcp_log_dev_queue_head, entry,
> tcp_log_dev_queue,
> > > + tldq_queue);
> > > + (*entry->tldq_dtor)(entry);
> > > + }
> > > +}
> > > +
> > > +static void
> > > +tcp_log_dev_clear_cdevpriv(void *data)
> > > +{
> > > + struct tcp_log_dev_info *priv;
> > > + struct tcp_log_dev_queue *entry, *entry_tmp;
> > > +
> > > + priv = (struct tcp_log_dev_info *)data;
> > > + if (priv == NULL)
> > > + return;
> > > +
> > > + /*
> > > + * Lock the queue and drop our references. We hold references
> to all
> > > + * the entries starting with tldi_head (or, if tldi_head ==
> NULL, all
> > > + * entries in the queue).
> > > + *
> > > + * Because we don't want anyone adding addition things to the
> queue
> > > + * while we are doing this, we lock the queue.
> > > + */
> > > + TCP_LOG_DEV_QUEUE_LOCK();
> > > + if (priv->tldi_head != NULL) {
> > > + entry = priv->tldi_head;
> > > + STAILQ_FOREACH_FROM_SAFE(entry,
> &tcp_log_dev_queue_head,
> > > + tldq_queue, entry_tmp) {
> > > + tcp_log_dev_clear_refcount(entry);
> > > + }
> > > + }
> > > + tcp_log_dev_listeners--;
> > > + KASSERT(tcp_log_dev_listeners >= 0,
> > > + ("%s: tcp_log_dev_listeners is unexpectedly negative",
> __func__));
> > > + STAILQ_REMOVE(&tcp_log_dev_reader_head, priv,
> tcp_log_dev_info,
> > > + tldi_list);
> > > + TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
> > > + TCP_LOG_DEV_QUEUE_UNLOCK();
> > > + free(priv, M_TCPLOGDEV);
> > > +}
> > > +
> > > +static int
> > > +tcp_log_dev_open(struct cdev *dev __unused, int flags, int devtype
> __unused,
> > > + struct thread *td __unused)
> > > +{
> > > + struct tcp_log_dev_info *priv;
> > > + struct tcp_log_dev_queue *entry;
> > > + int rv;
> > > +
> > > + /*
> > > + * Ideally, we shouldn't see these because of file system
> > > + * permissions.
> > > + */
> > > + if (flags & (FWRITE | FEXEC | FAPPEND | O_TRUNC))
> > > + return (ENODEV);
> > > +
> > > + /* Allocate space to hold information about where we are. */
> > > + priv = malloc(sizeof(struct tcp_log_dev_info), M_TCPLOGDEV,
> > > + M_ZERO | M_WAITOK);
> > > +
> > > + /* Stash the private data away. */
> > > + rv = devfs_set_cdevpriv((void *)priv,
> tcp_log_dev_clear_cdevpriv);
> > > + if (!rv) {
> > > + /*
> > > + * Increase the listener count, add this reader to
> the list, and
> > > + * take references on all current queues.
> > > + */
> > > + TCP_LOG_DEV_QUEUE_LOCK();
> > > + tcp_log_dev_listeners++;
> > > + STAILQ_INSERT_HEAD(&tcp_log_dev_reader_head, priv,
> tldi_list);
> > > + priv->tldi_head =
> STAILQ_FIRST(&tcp_log_dev_queue_head);
> > > + if (priv->tldi_head != NULL)
> > > + priv->tldi_cur =
> priv->tldi_head->tldq_buf;
> > > + STAILQ_FOREACH(entry, &tcp_log_dev_queue_head,
> tldq_queue)
> > > + TCP_LOG_DEV_QUEUE_REF(entry);
> > > + TCP_LOG_DEV_QUEUE_UNLOCK();
> > > + } else {
> > > + /* Free the entry. */
> > > + free(priv, M_TCPLOGDEV);
> > > + }
> > > + return (rv);
> > > +}
> > > +
> > > +static int
> > > +tcp_log_dev_write(struct cdev *dev __unused, struct uio *uio
> __unused,
> > > + int flags __unused)
> > > +{
> > > +
> > > + return (ENODEV);
> > > +}
> > > +
> > > +static __inline void
> > > +tcp_log_dev_rotate_bufs(struct tcp_log_dev_info *priv, int
> *lockstate)
> > > +{
> > > + struct tcp_log_dev_queue *entry;
> > > +
> > > + KASSERT(priv->tldi_head != NULL,
> > > + ("%s:%d: priv->tldi_head unexpectedly NULL",
> > > + __func__, __LINE__));
> > > + KASSERT(priv->tldi_head->tldq_buf == priv->tldi_cur,
> > > + ("%s:%d: buffer mismatch (%p vs %p)",
> > > + __func__, __LINE__, priv->tldi_head->tldq_buf,
> > > + priv->tldi_cur));
> > > + tcp_log_dev_queue_validate_lock(*lockstate);
> > > +
> > > + if (*lockstate == QUEUE_UNLOCKED) {
> > > + TCP_LOG_DEV_QUEUE_LOCK();
> > > + *lockstate = QUEUE_LOCKED;
> > > + }
> > > + entry = priv->tldi_head;
> > > + priv->tldi_head = STAILQ_NEXT(entry, tldq_queue);
> > > + tcp_log_dev_clear_refcount(entry);
> > > + priv->tldi_cur = NULL;
> > > +}
> > > +
> > > +static int
> > > +tcp_log_dev_read(struct cdev *dev __unused, struct uio *uio, int
> flags)
> > > +{
> > > + struct tcp_log_common_header *buf;
> > > + struct tcp_log_dev_info *priv;
> > > + struct tcp_log_dev_queue *entry;
> > > + ssize_t len;
> > > + int lockstate, rv;
> > > +
> > > + /* Get our private info. */
> > > + rv = devfs_get_cdevpriv((void **)&priv);
> > > + if (rv)
> > > + return (rv);
> > > +
> > > + lockstate = QUEUE_UNLOCKED;
> > > +
> > > + /* Do we need to get a new buffer? */
> > > + while (priv->tldi_cur == NULL ||
> > > + priv->tldi_cur->tlch_length <= priv->tldi_off) {
> > > + /* Did we somehow forget to rotate? */
> > > + KASSERT(priv->tldi_cur == NULL,
> > > + ("%s:%d: tldi_cur is unexpectedly non-NULL",
> __func__,
> > > + __LINE__));
> > > + if (priv->tldi_cur != NULL)
> > > + tcp_log_dev_rotate_bufs(priv,
> &lockstate);
> > > +
> > > + /*
> > > + * Before we start looking at tldi_head, we need a
> lock on the
> > > + * queue to make sure tldi_head stays stable.
> > > + */
> > > + if (lockstate == QUEUE_UNLOCKED) {
> > > + TCP_LOG_DEV_QUEUE_LOCK();
> > > + lockstate = QUEUE_LOCKED;
> > > + }
> > > +
> > > + /* We need the next buffer. Do we have one? */
> > > + if (priv->tldi_head == NULL && (flags &
> FNONBLOCK)) {
> > > + rv = EAGAIN;
> > > + goto done;
> > > + }
> > > + if (priv->tldi_head == NULL) {
> > > + /* Sleep and wait for more things we
> can read. */
> > > + rv = mtx_sleep(&tcp_log_dev_listeners,
> > > + &tcp_log_dev_queue_lock, PCATCH,
> "tcplogdev", 0);
> > > + if (rv)
> > > + goto done;
> > > + if (priv->tldi_head == NULL)
> > > + continue;
> > > + }
> > > +
> > > + /*
> > > + * We have an entry to read. We want to try to
> create a
> > > + * buffer, if one doesn't already exist.
> > > + */
> > > + entry = priv->tldi_head;
> > > + if (entry->tldq_buf == NULL) {
> > > + TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
> > > + buf = (*entry->tldq_xform)(entry);
> > > + if (buf == NULL) {
> > > + rv = EBUSY;
> > > + goto done;
> > > + }
> > > + entry->tldq_buf = buf;
> > > + }
> > > +
> > > + priv->tldi_cur = entry->tldq_buf;
> > > + priv->tldi_off = 0;
> > > + }
> > > +
> > > + /* Copy what we can from this buffer to the output buffer. */
> > > + if (uio->uio_resid > 0) {
> > > + /* Drop locks so we can take page faults. */
> > > + if (lockstate == QUEUE_LOCKED)
> > > + TCP_LOG_DEV_QUEUE_UNLOCK();
> > > + lockstate = QUEUE_UNLOCKED;
> > > +
> > > + KASSERT(priv->tldi_cur != NULL,
> > > + ("%s: priv->tldi_cur is unexpectedly NULL",
> __func__));
> > > +
> > > + /* Copy as much as we can to this uio. */
> > > + len = priv->tldi_cur->tlch_length -
> priv->tldi_off;
> > > + if (len > uio->uio_resid)
> > > + len = uio->uio_resid;
> > > + rv = uiomove(((uint8_t *)priv->tldi_cur) +
> priv->tldi_off,
> > > + len, uio);
> > > + if (rv != 0)
> > > + goto done;
> > > + priv->tldi_off += len;
> > > +#ifdef TCPLOG_DEBUG_COUNTERS
> > > + counter_u64_add(tcp_log_que_read, len);
> > > +#endif
> > > + }
> > > + /* Are we done with this buffer? If so, find the next one. */
> > > + if (priv->tldi_off >= priv->tldi_cur->tlch_length) {
> > > + KASSERT(priv->tldi_off ==
> priv->tldi_cur->tlch_length,
> > > + ("%s: offset (%ju) exceeds length (%ju)",
> __func__,
> > > + (uintmax_t)priv->tldi_off,
> > > + (uintmax_t)priv->tldi_cur->tlch_length));
> > > + tcp_log_dev_rotate_bufs(priv, &lockstate);
> > > + }
> > > +done:
> > > + tcp_log_dev_queue_validate_lock(lockstate);
> > > + if (lockstate == QUEUE_LOCKED)
> > > + TCP_LOG_DEV_QUEUE_UNLOCK();
> > > + return (rv);
> > > +}
> > > +
> > > +static int
> > > +tcp_log_dev_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t
> data,
> > > + int fflag __unused, struct thread *td __unused)
> > > +{
> > > + struct tcp_log_dev_info *priv;
> > > + int rv;
> > > +
> > > + /* Get our private info. */
> > > + rv = devfs_get_cdevpriv((void **)&priv);
> > > + if (rv)
> > > + return (rv);
> > > +
> > > + /*
> > > + * Set things. Here, we are most concerned about the
> non-blocking I/O
> > > + * flag.
> > > + */
> > > + rv = 0;
> > > + switch (cmd) {
> > > + case FIONBIO:
> > > + break;
> > > + case FIOASYNC:
> > > + if (*(int *)data != 0)
> > > + rv = EINVAL;
> > > + break;
> > > + default:
> > > + rv = ENOIOCTL;
> > > + }
> > > + return (rv);
> > > +}
> > > +
> > > +static int
> > > +tcp_log_dev_poll(struct cdev *dev __unused, int events, struct
> thread *td)
> > > +{
> > > + struct tcp_log_dev_info *priv;
> > > + int revents;
> > > +
> > > + /*
> > > + * Get our private info. If this fails, claim that all events
> are
> > > + * ready. That should prod the user to do something that will
> > > + * make the error evident to them.
> > > + */
> > > + if (devfs_get_cdevpriv((void **)&priv))
> > > + return (events);
> > > +
> > > + revents = 0;
> > > + if (events & (POLLIN | POLLRDNORM)) {
> > > + /*
> > > + * We can (probably) read right now if we are
> partway through
> > > + * a buffer or if we are just about to start a
> buffer.
> > > + * Because we are going to read tldi_head, we
> should acquire
> > > + * a read lock on the queue.
> > > + */
> > > + TCP_LOG_DEV_QUEUE_LOCK();
> > > + if ((priv->tldi_head != NULL && priv->tldi_cur ==
> NULL) ||
> > > + (priv->tldi_cur != NULL &&
> > > + priv->tldi_off <
> priv->tldi_cur->tlch_length))
> > > + revents = events & (POLLIN |
> POLLRDNORM);
> > > + else
> > > + selrecord(td, &tcp_log_sel);
> > > + TCP_LOG_DEV_QUEUE_UNLOCK();
> > > + } else {
> > > + /*
> > > + * It only makes sense to poll for reading. So,
> again, prod the
> > > + * user to do something that will make the error
> of their ways
> > > + * apparent.
> > > + */
> > > + revents = events;
> > > + }
> > > + return (revents);
> > > +}
> > > +
> > > +int
> > > +tcp_log_dev_add_log(struct tcp_log_dev_queue *entry)
> > > +{
> > > + struct tcp_log_dev_info *priv;
> > > + int rv;
> > > + bool wakeup_needed;
> > > +
> > > + KASSERT(entry->tldq_buf != NULL || entry->tldq_xform != NULL,
> > > + ("%s: Called with both tldq_buf and tldq_xform set to
> NULL",
> > > + __func__));
> > > + KASSERT(entry->tldq_dtor != NULL,
> > > + ("%s: Called with tldq_dtor set to NULL", __func__));
> > > +
> > > + /* Get a lock on the queue. */
> > > + TCP_LOG_DEV_QUEUE_LOCK();
> > > +
> > > + /* If no one is listening, tell the caller to free the
> resources. */
> > > + if (tcp_log_dev_listeners == 0) {
> > > + rv = ENXIO;
> > > + goto done;
> > > + }
> > > +
> > > + /* Add this to the end of the tailq. */
> > > + STAILQ_INSERT_TAIL(&tcp_log_dev_queue_head, entry,
> tldq_queue);
> > > +
> > > + /* Add references for all current listeners. */
> > > + refcount_init(&entry->tldq_refcnt, tcp_log_dev_listeners);
> > > +
> > > + /*
> > > + * If any listener is currently stuck on NULL, that means they
> are
> > > + * waiting. Point their head to this new entry.
> > > + */
> > > + wakeup_needed = false;
> > > + STAILQ_FOREACH(priv, &tcp_log_dev_reader_head, tldi_list)
> > > + if (priv->tldi_head == NULL) {
> > > + priv->tldi_head = entry;
> > > + wakeup_needed = true;
> > > + }
> > > +
> > > + if (wakeup_needed) {
> > > + selwakeup(&tcp_log_sel);
> > > + wakeup(&tcp_log_dev_listeners);
> > > + }
> > > +
> > > + rv = 0;
> > > +
> > > +done:
> > > + TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
> > > + TCP_LOG_DEV_QUEUE_UNLOCK();
> > > + return (rv);
> > > +}
> > > +
> > > +static int
> > > +tcp_log_dev_modevent(module_t mod __unused, int type, void *data
> __unused)
> > > +{
> > > +
> > > + /* TODO: Support intelligent unloading. */
> > > + switch (type) {
> > > + case MOD_LOAD:
> > > + if (bootverbose)
> > > + printf("tcp_log: tcp_log device\n");
> > > + memset(&tcp_log_sel, 0, sizeof(tcp_log_sel));
> > > + memset(&tcp_log_dev_queue_lock, 0, sizeof(struct
> mtx));
> > > + mtx_init(&tcp_log_dev_queue_lock, "tcp_log dev",
> > > + "tcp_log device queues", MTX_DEF);
> > > + tcp_log_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD,
> > > + &tcp_log_cdevsw, 0, NULL, UID_ROOT,
> GID_WHEEL, 0400,
> > > + "tcp_log");
> > > + break;
> > > + default:
> > > + return (EOPNOTSUPP);
> > > + }
> > > +
> > > + return (0);
> > > +}
> > > +
> > > +DEV_MODULE(tcp_log_dev, tcp_log_dev_modevent, NULL);
> > > +MODULE_VERSION(tcp_log_dev, 1);
> > >
> > > Added: head/sys/dev/tcp_log/tcp_log_dev.h
> > >
> ==============================================================================
> > > --- /dev/null 00:00:00 1970 (empty, because file is
> newly added)
> > > +++ head/sys/dev/tcp_log/tcp_log_dev.h Thu Mar 22 09:40:08
> 2018 (r331347)
> > > @@ -0,0 +1,88 @@
> > > +/*-
> > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
> > > + *
> > > + * Copyright (c) 2016
> > > + * Netflix Inc. All rights reserved.
> > > + *
> > > + * Redistribution and use in source and binary forms, with or
> without
> > > + * modification, are permitted provided that the following
> conditions
> > > + * are met:
> > > + * 1. Redistributions of source code must retain the above
> copyright
> > > + * notice, this list of conditions and the following
> disclaimer.
> > > + * 2. Redistributions in binary form must reproduce the above
> copyright
> > > + * notice, this list of conditions and the following
> disclaimer in the
> > > + * documentation and/or other materials provided with the
> distribution.
> > > + *
> > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS
> IS'' AND
> > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> TO, THE
> > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> PARTICULAR PURPOSE
> > > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
> BE LIABLE
> > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> SUBSTITUTE GOODS
> > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION)
> > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> CONTRACT, STRICT
> > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
> IN ANY WAY
> > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> POSSIBILITY OF
> > > + * SUCH DAMAGE.
> > > + *
> > > + * $FreeBSD$
> > > + */
> > > +
> > > +#ifndef __tcp_log_dev_h__
> > > +#define __tcp_log_dev_h__
> > > +
> > > +/*
> > > + * This is the common header for data streamed from the log device.
> All
> > > + * blocks of data need to start with this header.
> > > + */
> > > +struct tcp_log_common_header {
> > > + uint32_t tlch_version; /* Version is specific
> to type. */
> > > + uint32_t tlch_type; /* Type of entry(ies)
> that follow. */
> > > + uint64_t tlch_length; /* Total length,
> including header. */
> > > +} __packed;
> > > +
> > > +#define TCP_LOG_DEV_TYPE_BBR 1 /* black box
> recorder */
> > > +
> > > +#ifdef _KERNEL
> > > +/*
> > > + * This is a queue entry. All queue entries need to start with this
> structure
> > > + * so the common code can cast them to this structure; however,
> other modules
> > > + * are free to include additional data after this structure.
> > > + *
> > > + * The elements are explained here:
> > > + * tldq_queue: used by the common code to maintain this entry's
> position in the
> > > + * queue.
> > > + * tldq_buf: should be NULL, or a pointer to a chunk of data. The
> data must be
> > > + * as long as the common header indicates.
> > > + * tldq_xform: If tldq_buf is NULL, the code will call this to
> create the
> > > + * the tldq_buf object. The function should *not* directly
> modify tldq_buf,
> > > + * but should return the buffer (which must meet the
> restrictions
> > > + * indicated for tldq_buf).
> > > + * tldq_dtor: This function is called to free the queue entry. If
> tldq_buf is
> > > + * not NULL, the dtor function must free that, too.
> > > + * tldq_refcnt: used by the common code to indicate how many
> readers still need
> > > + * this data.
> > > + */
> > > +struct tcp_log_dev_queue {
> > > + STAILQ_ENTRY(tcp_log_dev_queue) tldq_queue;
> > > + struct tcp_log_common_header *tldq_buf;
> > > + struct tcp_log_common_header *(*tldq_xform)(struct
> tcp_log_dev_queue *entry);
> > > + void (*tldq_dtor)(struct tcp_log_dev_queue *entry);
> > > + volatile u_int tldq_refcnt;
> > > +};
> > > +
> > > +STAILQ_HEAD(log_queueh, tcp_log_dev_queue);
> > > +
> > > +struct tcp_log_dev_info {
> > > + STAILQ_ENTRY(tcp_log_dev_info) tldi_list;
> > > + struct tcp_log_dev_queue *tldi_head;
> > > + struct tcp_log_common_header *tldi_cur;
> > > + off_t tldi_off;
> > > +};
> > > +STAILQ_HEAD(log_infoh, tcp_log_dev_info);
> > > +
> > > +
> > > +MALLOC_DECLARE(M_TCPLOGDEV);
> > > +int tcp_log_dev_add_log(struct tcp_log_dev_queue *entry);
> > > +#endif /* _KERNEL */
> > > +#endif /* !__tcp_log_dev_h__ */
> > >
> > > Modified: head/sys/kern/subr_witness.c
> > >
> ==============================================================================
> > > --- head/sys/kern/subr_witness.c Thu Mar 22 08:32:39 2018
> (r331346)
> > > +++ head/sys/kern/subr_witness.c Thu Mar 22 09:40:08 2018
> (r331347)
> > > @@ -640,6 +640,14 @@ static struct witness_order_list_entry
> order_lists[] =
> > > { "db->db_mtx", &lock_class_sx },
> > > { NULL, NULL },
> > > /*
> > > + * TCP log locks
> > > + */
> > > + { "TCP ID tree", &lock_class_rw },
> > > + { "tcp log id bucket", &lock_class_mtx_sleep },
> > > + { "tcpinp", &lock_class_rw },
> > > + { "TCP log expireq", &lock_class_mtx_sleep },
> > > + { NULL, NULL },
> > > + /*
> > > * spin locks
> > > */
> > > #ifdef SMP
> > >
> > > Modified: head/sys/netinet/tcp.h
> > >
> ==============================================================================
> > > --- head/sys/netinet/tcp.h Thu Mar 22 08:32:39 2018
> (r331346)
> > > +++ head/sys/netinet/tcp.h Thu Mar 22 09:40:08 2018
> (r331347)
> > > @@ -168,6 +168,12 @@ struct tcphdr {
> > > #define TCP_NOOPT 8 /* don't use TCP options */
> > > #define TCP_MD5SIG 16 /* use MD5 digests (RFC2385) */
> > > #define TCP_INFO 32 /* retrieve tcp_info
> structure */
> > > +#define TCP_LOG 34 /* configure event
> logging for connection */
> > > +#define TCP_LOGBUF 35 /* retrieve event log
> for connection */
> > > +#define TCP_LOGID 36 /* configure log ID to
> correlate connections */
> > > +#define TCP_LOGDUMP 37 /* dump connection log
> events to device */
> > > +#define TCP_LOGDUMPID 38 /* dump events from
> connections with same ID to
> > > + device */
> > > #define TCP_CONGESTION 64 /* get/set congestion
> control algorithm */
> > > #define TCP_CCALGOOPT 65 /* get/set cc algorithm
> specific options */
> > > #define TCP_KEEPINIT 128 /* N, time to establish
> connection */
> > > @@ -188,6 +194,9 @@ struct tcphdr {
> > > #define TCPI_OPT_WSCALE 0x04
> > > #define TCPI_OPT_ECN 0x08
> > > #define TCPI_OPT_TOE 0x10
> > > +
> > > +/* Maximum length of log ID. */
> > > +#define TCP_LOG_ID_LEN 64
> > >
> > > /*
> > > * The TCP_INFO socket option comes from the Linux 2.6 TCP API,
> and permits
> > >
> > > Modified: head/sys/netinet/tcp_input.c
> > >
> ==============================================================================
> > > --- head/sys/netinet/tcp_input.c Thu Mar 22 08:32:39 2018
> (r331346)
> > > +++ head/sys/netinet/tcp_input.c Thu Mar 22 09:40:08 2018
> (r331347)
> > > @@ -102,6 +102,7 @@ __FBSDID("$FreeBSD$");
> > > #include <netinet6/nd6.h>
> > > #include <netinet/tcp.h>
> > > #include <netinet/tcp_fsm.h>
> > > +#include <netinet/tcp_log_buf.h>
> > > #include <netinet/tcp_seq.h>
> > > #include <netinet/tcp_timer.h>
> > > #include <netinet/tcp_var.h>
> > > @@ -1592,6 +1593,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr
> *th, stru
> > > /* Save segment, if requested. */
> > > tcp_pcap_add(th, m, &(tp->t_inpkts));
> > > #endif
> > > + TCP_LOG_EVENT(tp, th, &so->so_rcv, &so->so_snd, TCP_LOG_IN, 0,
> > > + tlen, NULL, true);
> > >
> > > if ((thflags & TH_SYN) && (thflags & TH_FIN) &&
> V_drop_synfin) {
> > > if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
> > >
> > > Added: head/sys/netinet/tcp_log_buf.c
> > >
> ==============================================================================
> > > --- /dev/null 00:00:00 1970 (empty, because file is
> newly added)
> > > +++ head/sys/netinet/tcp_log_buf.c Thu Mar 22 09:40:08 2018
> (r331347)
> > > @@ -0,0 +1,2480 @@
> > > +/*-
> > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
> > > + *
> > > + * Copyright (c) 2016-2018
> > > + * Netflix Inc. All rights reserved.
> > > + *
> > > + * Redistribution and use in source and binary forms, with or
> without
> > > + * modification, are permitted provided that the following
> conditions
> > > + * are met:
> > > + * 1. Redistributions of source code must retain the above
> copyright
> > > + * notice, this list of conditions and the following
> disclaimer.
> > > + * 2. Redistributions in binary form must reproduce the above
> copyright
> > > + * notice, this list of conditions and the following
> disclaimer in the
> > > + * documentation and/or other materials provided with the
> distribution.
> > > + *
> > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS
> IS'' AND
> > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> TO, THE
> > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> PARTICULAR PURPOSE
> > > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
> BE LIABLE
> > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> SUBSTITUTE GOODS
> > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> INTERRUPTION)
> > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> CONTRACT, STRICT
> > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
> IN ANY WAY
> > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> POSSIBILITY OF
> > > + * SUCH DAMAGE.
> > > + *
> > > + */
> > > +
> > > +#include <sys/cdefs.h>
> > > +__FBSDID("$FreeBSD$");
> > > +
> > > +#include <sys/param.h>
> > > +#include <sys/kernel.h>
> > > +#include <sys/lock.h>
> > > +#include <sys/malloc.h>
> > > +#include <sys/mutex.h>
> > > +#include <sys/queue.h>
> > > +#include <sys/refcount.h>
> > > +#include <sys/rwlock.h>
> > > +#include <sys/socket.h>
> > > +#include <sys/socketvar.h>
> > > +#include <sys/sysctl.h>
> > > +#include <sys/tree.h>
> > > +#include <sys/counter.h>
> > > +
> > > +#include <dev/tcp_log/tcp_log_dev.h>
> > > +
> > > +#include <net/if.h>
> > > +#include <net/if_var.h>
> > > +#include <net/vnet.h>
> > > +
> > > +#include <netinet/in.h>
> > > +#include <netinet/in_pcb.h>
> > > +#include <netinet/in_var.h>
> > > +#include <netinet/tcp_var.h>
> > > +#include <netinet/tcp_log_buf.h>
> > > +
> > > +/* Default expiry time */
> > > +#define TCP_LOG_EXPIRE_TIME ((sbintime_t)60 * SBT_1S)
> > > +
> > > +/* Max interval at which to run the expiry timer */
> > > +#define TCP_LOG_EXPIRE_INTVL ((sbintime_t)5 * SBT_1S)
> > > +
> > > +bool tcp_log_verbose;
> > > +static uma_zone_t tcp_log_bucket_zone, tcp_log_node_zone,
> tcp_log_zone;
> > > +static int tcp_log_session_limit =
> TCP_LOG_BUF_DEFAULT_SESSION_LIMIT;
> > > +static uint32_t tcp_log_version = TCP_LOG_BUF_VER;
> > > +RB_HEAD(tcp_log_id_tree, tcp_log_id_bucket);
> > > +static struct tcp_log_id_tree tcp_log_id_head;
> > > +static STAILQ_HEAD(, tcp_log_id_node) tcp_log_expireq_head =
> > > + STAILQ_HEAD_INITIALIZER(tcp_log_expireq_head);
> > > +static struct mtx tcp_log_expireq_mtx;
> > > +static struct callout tcp_log_expireq_callout;
> > > +static uint64_t tcp_log_auto_ratio = 0;
> > > +static uint64_t tcp_log_auto_ratio_cur = 0;
> > > +static uint32_t tcp_log_auto_mode = TCP_LOG_STATE_TAIL;
> > > +static bool tcp_log_auto_all = false;
> > > +
> > > +RB_PROTOTYPE_STATIC(tcp_log_id_tree, tcp_log_id_bucket, tlb_rb,
> tcp_log_id_cmp)
> > > +
> > > +SYSCTL_NODE(_net_inet_tcp, OID_AUTO, bb, CTLFLAG_RW, 0, "TCP Black
> Box controls");
> > > +
> > > +SYSCTL_BOOL(_net_inet_tcp_bb, OID_AUTO, log_verbose, CTLFLAG_RW,
> &tcp_log_verbose,
> > > + 0, "Force verbose logging for TCP traces");
> > > +
> > > +SYSCTL_INT(_net_inet_tcp_bb, OID_AUTO, log_session_limit,
> > > + CTLFLAG_RW, &tcp_log_session_limit, 0,
> > > + "Maximum number of events maintained for each TCP session");
> > > +
> > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_global_limit,
> CTLFLAG_RW,
> > > + &tcp_log_zone, "Maximum number of events maintained for all
> TCP sessions");
> > > +
> > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_global_entries,
> CTLFLAG_RD,
> > > + &tcp_log_zone, "Current number of events maintained for all
> TCP sessions");
> > > +
> > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_id_limit,
> CTLFLAG_RW,
> > > + &tcp_log_bucket_zone, "Maximum number of log IDs");
> > > +
> > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_id_entries,
> CTLFLAG_RD,
> > > + &tcp_log_bucket_zone, "Current number of log IDs");
> > > +
> > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_id_tcpcb_limit,
> CTLFLAG_RW,
> > > + &tcp_log_node_zone, "Maximum number of tcpcbs with log IDs");
> > > +
> > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_id_tcpcb_entries,
> CTLFLAG_RD,
> > > + &tcp_log_node_zone, "Current number of tcpcbs with log IDs");
> > > +
> > > +SYSCTL_U32(_net_inet_tcp_bb, OID_AUTO, log_version, CTLFLAG_RD,
> &tcp_log_version,
> > > + 0, "Version of log formats exported");
> > > +
> > > +SYSCTL_U64(_net_inet_tcp_bb, OID_AUTO, log_auto_ratio, CTLFLAG_RW,
> > > + &tcp_log_auto_ratio, 0, "Do auto capturing for 1 out of N
> sessions");
> > > +
> > > +SYSCTL_U32(_net_inet_tcp_bb, OID_AUTO, log_auto_mode, CTLFLAG_RW,
> > > + &tcp_log_auto_mode, TCP_LOG_STATE_HEAD_AUTO,
> > > + "Logging mode for auto-selected sessions (default is
> TCP_LOG_STATE_HEAD_AUTO)");
> > > +
> > > +SYSCTL_BOOL(_net_inet_tcp_bb, OID_AUTO, log_auto_all, CTLFLAG_RW,
> > > + &tcp_log_auto_all, false,
> > > + "Auto-select from all sessions (rather than just those with
> IDs)");
> > > +
> > > +#ifdef TCPLOG_DEBUG_COUNTERS
> > > +counter_u64_t tcp_log_queued;
> > > +counter_u64_t tcp_log_que_fail1;
> > > +counter_u64_t tcp_log_que_fail2;
> > > +counter_u64_t tcp_log_que_fail3;
> > > +counter_u64_t tcp_log_que_fail4;
> > > +counter_u64_t tcp_log_que_fail5;
> > > +counter_u64_t tcp_log_que_copyout;
> > > +counter_u64_t tcp_log_que_read;
> > > +counter_u64_t tcp_log_que_freed;
> > > +
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, queued, CTLFLAG_RD,
> > > + &tcp_log_queued, "Number of entries queued");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail1, CTLFLAG_RD,
> > > + &tcp_log_que_fail1, "Number of entries queued but fail 1");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail2, CTLFLAG_RD,
> > > + &tcp_log_que_fail2, "Number of entries queued but fail 2");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail3, CTLFLAG_RD,
> > > + &tcp_log_que_fail3, "Number of entries queued but fail 3");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail4, CTLFLAG_RD,
> > > + &tcp_log_que_fail4, "Number of entries queued but fail 4");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail5, CTLFLAG_RD,
> > > + &tcp_log_que_fail5, "Number of entries queued but fail 4");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, copyout, CTLFLAG_RD,
> > > + &tcp_log_que_copyout, "Number of entries copied out");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, read, CTLFLAG_RD,
> > > + &tcp_log_que_read, "Number of entries read from the queue");
> > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, freed, CTLFLAG_RD,
> > > + &tcp_log_que_freed, "Number of entries freed after reading");
> > > +#endif
> > > +
> > > +#ifdef INVARIANTS
> > > +#define TCPLOG_DEBUG_RINGBUF
> > > +#endif
> > > +
> > > +struct tcp_log_mem
> > > +{
> > > + STAILQ_ENTRY(tcp_log_mem) tlm_queue;
> > > + struct tcp_log_buffer tlm_buf;
> > > + struct tcp_log_verbose tlm_v;
> > > +#ifdef TCPLOG_DEBUG_RINGBUF
> > > + volatile int tlm_refcnt;
> > > +#endif
> > > +};
> > > +
> > > +/* 60 bytes for the header, + 16 bytes for padding */
> > > +static uint8_t zerobuf[76];
> > > +
> > > +/*
> > > + * Lock order:
> > > + * 1. TCPID_TREE
> > > + * 2. TCPID_BUCKET
> > > + * 3. INP
> > > + *
> > > + * Rules:
> > > + * A. You need a lock on the Tree to add/remove buckets.
> > > + * B. You need a lock on the bucket to add/remove nodes from the
> bucket.
> > > + * C. To change information in a node, you need the INP lock if the
> tln_closed
> > > + * field is false. Otherwise, you need the bucket lock. (Note
> that the
> > > + * tln_closed field can change at any point, so you need to
> recheck the
> > > + * entry after acquiring the INP lock.)
> > > + * D. To remove a node from the bucket, you must have that entry
> locked,
> > > + * according to the criteria of Rule C. Also, the node must
> not be on
> > > + * the expiry queue.
> > > + * E. The exception to C is the expiry queue fields, which are
> locked by
> > > + * the TCPLOG_EXPIREQ lock.
> > > + *
> > > + * Buckets have a reference count. Each node is a reference.
> Further,
> > > + * other callers may add reference counts to keep a bucket from
> disappearing.
> > > + * You can add a reference as long as you own a lock sufficient to
> keep the
> > > + * bucket from disappearing. For example, a common use is:
> > > + * a. Have a locked INP, but need to lock the TCPID_BUCKET.
> > > + * b. Add a refcount on the bucket. (Safe because the INP lock
> prevents
> > > + * the TCPID_BUCKET from going away.)
> > > + * c. Drop the INP lock.
> > > + * d. Acquire a lock on the TCPID_BUCKET.
> > > + * e. Acquire a lock on the INP.
> > > + * f. Drop the refcount on the bucket.
> > > + * (At this point, the bucket may disappear.)
> > > + *
> > > + * Expire queue lock:
> > > + * You can acquire this with either the bucket or INP lock. Don't
> reverse it.
> > > + * When the expire code has committed to freeing a node, it resets
> the expiry
> > > + * time to SBT_MAX. That is the signal to everyone else that they
> should
> > > + * leave that node alone.
> > > + */
> > > +static struct rwlock tcp_id_tree_lock;
> > > +#define TCPID_TREE_WLOCK()
> rw_wlock(&tcp_id_tree_lock)
> > > +#define TCPID_TREE_RLOCK()
> rw_rlock(&tcp_id_tree_lock)
> > > +#define TCPID_TREE_UPGRADE()
> rw_try_upgrade(&tcp_id_tree_lock)
> > > +#define TCPID_TREE_WUNLOCK()
> rw_wunlock(&tcp_id_tree_lock)
> > > +#define TCPID_TREE_RUNLOCK()
> rw_runlock(&tcp_id_tree_lock)
> > > +#define TCPID_TREE_WLOCK_ASSERT()
> rw_assert(&tcp_id_tree_lock, RA_WLOCKED)
> > > +#define TCPID_TREE_RLOCK_ASSERT()
> rw_assert(&tcp_id_tree_lock, RA_RLOCKED)
> > > +#define TCPID_TREE_UNLOCK_ASSERT()
> rw_assert(&tcp_id_tree_lock, RA_UNLOCKED)
> > > +
> > > +#define TCPID_BUCKET_LOCK_INIT(tlb)
> mtx_init(&((tlb)->tlb_mtx), "tcp log id bucket", NULL, MTX_DEF)
> > > +#define TCPID_BUCKET_LOCK_DESTROY(tlb)
> mtx_destroy(&((tlb)->tlb_mtx))
> > > +#define TCPID_BUCKET_LOCK(tlb)
> mtx_lock(&((tlb)->tlb_mtx))
> > > +#define TCPID_BUCKET_UNLOCK(tlb)
> mtx_unlock(&((tlb)->tlb_mtx))
> > > +#define TCPID_BUCKET_LOCK_ASSERT(tlb)
> mtx_assert(&((tlb)->tlb_mtx), MA_OWNED)
> > > +#define TCPID_BUCKET_UNLOCK_ASSERT(tlb)
> mtx_assert(&((tlb)->tlb_mtx), MA_NOTOWNED)
> > > +
> > > +#define TCPID_BUCKET_REF(tlb)
> refcount_acquire(&((tlb)->tlb_refcnt))
> > > +#define TCPID_BUCKET_UNREF(tlb)
> refcount_release(&((tlb)->tlb_refcnt))
> > > +
> > > +#define TCPLOG_EXPIREQ_LOCK()
> mtx_lock(&tcp_log_expireq_mtx)
> > > +#define TCPLOG_EXPIREQ_UNLOCK()
> mtx_unlock(&tcp_log_expireq_mtx)
> > > +
> > > +SLIST_HEAD(tcp_log_id_head, tcp_log_id_node);
> > > +
> > > +struct tcp_log_id_bucket
> > > +{
> > > + /*
> > > + * tlb_id must be first. This lets us use strcmp on
> > > + * (struct tcp_log_id_bucket *) and (char *) interchangeably.
> > > + */
> > > + char
> tlb_id[TCP_LOG_ID_LEN];
> > > + RB_ENTRY(tcp_log_id_bucket) tlb_rb;
> > > + struct tcp_log_id_head tlb_head;
> > > + struct mtx tlb_mtx;
> > > + volatile u_int tlb_refcnt;
> > > +};
> > > +
> > > +struct tcp_log_id_node
> > > +{
> > > + SLIST_ENTRY(tcp_log_id_node) tln_list;
> > > + STAILQ_ENTRY(tcp_log_id_node) tln_expireq; /* Locked by the
> expireq lock */
> > > + sbintime_t tln_expiretime; /* Locked by
> the expireq lock */
> > > +
> > > + /*
> > > + * If INP is NULL, that means the connection has closed. We've
> > > + * saved the connection endpoint information and the log
> entries
> > > + * in the tln_ie and tln_entries members. We've also saved a
> pointer
> > > + * to the enclosing bucket here. If INP is not NULL, the
> information is
> > > + * in the PCB and not here.
> > > + */
> > > + struct inpcb *tln_inp;
> > > + struct tcpcb *tln_tp;
> > > + struct tcp_log_id_bucket *tln_bucket;
> > > + struct in_endpoints tln_ie;
> > > + struct tcp_log_stailq tln_entries;
> > > + int tln_count;
> > > + volatile int tln_closed;
> > > + uint8_t tln_af;
> > > +};
> > > +
> > > +enum tree_lock_state {
> > > + TREE_UNLOCKED = 0,
> > > + TREE_RLOCKED,
> > > + TREE_WLOCKED,
> > > +};
> > > +
> > > +/* Do we want to select this session for auto-logging? */
> > > +static __inline bool
> > > +tcp_log_selectauto(void)
> > > +{
> > > +
> > > + /*
> > >
> > > *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
> > >
> >
>
> References
>
> Visible links
> 1. mailto:ruslan.bukin at cl.cam.ac.uk
> 2. https://svnweb.freebsd.org/changeset/base/331347
> 3. https://reviews.freebsd.org/D11085
More information about the svn-src-all
mailing list