Re: AF_UNIX socketpair dgram queue sizes

In reply to: Jan Schaumann via freebsd-net : "Re: AF_UNIX socketpair dgram queue sizes"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Johnston <markj_at_freebsd.org>
Date: Wed, 10 Nov 2021 15:53:29 UTC
On Wed, Nov 10, 2021 at 12:05:33AM -0500, Jan Schaumann via freebsd-net wrote:
> Mark Johnston <markj@freebsd.org> wrote:
> 
> > There is an additional factor: wasted space.  When writing data to a
> > socket, the kernel buffers that data in mbufs.  All mbufs have some
> > amount of embedded storage, and the kernel accounts for that storage,
> > whether or not it's used.  With small byte datagrams there can be a lot
> > of overhead;
> 
> I'm observing two mbufs being allocated for each
> datagram for small datagrams, but only one mbuf for
> larger datagrams.
> 
> That seems counter-intuitive to me?

From my reading, sbappendaddr_locked_internal() will always allocate an
extra mbuf for the address, so I can't explain this.  What's the
threshold for "larger"?  How are you counting mbuf allocations?

> > The kern.ipc.sockbuf_waste_factor sysctl controls the upper limit on
> > total bytes (used or not) that may be enqueued in a socket buffer.  The
> > default value of 8 means that we'll waste up to 7 bytes per byte of
> > data, I think.  Setting it higher should let you enqueue more messages.
> 
> Ah, this looks like something relevant.
> 
> Setting kern.ipc.sockbuf_waste_factor=1, I can only
> write 8 1-byte datagrams.  For any increase of the
> waste factor by one, I get another 8 1-byte datagrams,
> up until waste factor > 29, at which point we hit
> recvspace: 30 * 8 = 240, so 240 1-byte datagrams with
> 16 bytes dgram overhead means we get 240*17 = 4080
> bytes, which just fits (well, with room for one empty
> 16-byte dgram) into the recvspace = 4096.
> 
> But I still don't get the direct relationship between
> the waste factor and the recvspace / buffer queue:
> with a waste_factor of 1 and a datagram with 1972
> bytes, I'm able to write one dgram with 1972 bytes +
> 1 dgram with 1520 bytes = 3492 bytes (plus 2 * 16
> bytes overhead = 3524 bytes).  There'd still have been
> space for 572 more bytes in the second dgram.

For a datagram of size 1972, we'll allocate one mbuf (size 256 bytes)
and one mbuf "cluster" (2048 bytes), and then a second 256 byte mbuf for
the address.  So sb_mbcnt will be 2560 bytes, leaving 1536 bytes of
space for a second datagram.

> Liekwise, trying to write a single 1973 dgram fills
> the queue and no additional bytes can be written in a
> second dgram, but I can write a single 2048 byte
> dgram.

I suspect that this bit of the unix socket code might be related:

https://cgit.freebsd.org/src/tree/sys/kern/uipc_usrreq.c#n1144

Here we get the amount of space available in the recv buffer (sbcc) and
compare it with the data limit in the _send_ buffer to determine whether
to apply backpressure.  You wrote "SO_SNDBUF = 2048" in your first
email, and if that's the case here then writing ~2000 bytes would cause
the limit to be hit.  I'm not sure why 1973 is the magic value here.

> Still confused...