BPF packet pagesize limit

Tue Nov 21 16:14:23 UTC 2017

On 20/11/2017 11:26 pm, Kristof Provost wrote:
> On 19 Nov 2017, at 19:49, Catalin Salgau wrote:
> 
>     I'm trying to address the limitation in (upstream) net/vblade that was
>     brought up in
>     https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205164
>     This is related to writes larger than hw.pagesize but smaller than the
>     configured MTU with BPF.
>     I traced this to sys/net/bpf.c where calls to bpfwrite() will use
>     bpf_movein() which in turn uses m_get2() to allocate a single mbuf. This
>     will fail if the requested mbuf size is larger than MJUMPAGESIZE
>     (defined as PAGE_SIZE on x86). I believe this should use m_getm2() and
>     populate multiple mbufs.
>     Code in NetBSD explicitly notes that they omit mbuf chaining, but this
>     is not documentated behaviour in the man page.
> 
> Your analysis looks correct.
> 
>     Any chance of having this fixed in a supported release, or getting a
>     usable/documented workaround?
> 
> Can you see if this works for you?
> 
> |diff --git a/sys/net/bpf.c b/sys/net/bpf.c index
> b176856cf35..b9ff40699bb 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c
> @@ -547,9 +547,11 @@ bpf_movein(struct uio *uio, int linktype, struct
> ifnet *ifp, struct mbuf **mp, if (len < hlen || len - hlen >
> ifp->if_mtu) return (EMSGSIZE); - m = m_get2(len, M_WAITOK, MT_DATA,
> M_PKTHDR); + m = m_getm2(NULL, len, M_WAITOK, MT_DATA, M_PKTHDR); if (m
> == NULL) return (EIO); + KASSERT(m->m_next == NULL, ("mbuf chains not
> supported here")); + m->m_pkthdr.len = m->m_len = len; *mp = m; |
> 
> It’s a little icky to trust that this will produce a single mbuf rather
> than a chain, but it appears to be the case. Sadly the rest of the bpf
> code (and especially bpf_filter()) really needs the mbuf to have a
> single contiguous buffer.

Actually m_getm2() will always produce a chain for a size larger than
the page size, due to m_getjcl() being called with MJUMPAGESIZE every
time a large buffer is requested. The function could probably be called
with MJUM9BYTES in this case, but this should be dependant on backing
interface configuration(?).
On the other hand, as you pointed out, bpf_filter really needs a single
mbuf, and so does the call to uiomove(). The filter call, as it stands,
will overread due to being passed the larger len value, instead of the
mbuf's len.
As a note, to avoid the overruns and related panics, I'd suggest anyone
else trying this replace the assertion with an explicit
	if (m->m_next != NULL) {
		error = EIO;
		goto bad;
	}

It is quite clear to me that changing bpf_filter to handle a chain would
be very expensive for the BPF VM. One could probably have a caveat that
the filter will only run on the first mbuf(for reads and writes). I
don't think it would be legal to alter the uio in-place and run the
filter on that, if it's contiguous.

Any alternatives to this? I couldn't figure out if zero-copy mode chains
to bpf_write calls or not. Any insights?

Thanks
> 
> Regards,
> Kristof
>