Small patch to multicast code...

gnn at FreeBSD.org gnn at FreeBSD.org
Fri Aug 22 21:40:05 UTC 2008


At Fri, 22 Aug 2008 19:43:03 +0100,
Bruce M. Simpson wrote:
> 
> We end up calling if_simloop() from a few "interesting" places, in 
> particular the kernel PIM packet handler.
> 
> In this particular case we're going to take a full mbuf chain copy every 
> time we send a packet which needs to be looped back to userland.

Right, I know the penalty.

> It's been a while since I've done any in-depth FreeBSD work other
> than hacking on the IGMPv3 snap, and my time is largely tied up with
> other work these days, sadly.
> 
> It doesn't seem right to my mind that we need to make a full copy of
> an mbuf chain with m_dup() to workaround this kind of problem.
> 
> Whilst it may suffice for a band-aid workaround, we may see mbuf
> pool fragmentation as packet rates go up.
> 
> However we are now in a "new world order" where mbuf chains may be
> very tied to the device where they've originated or to where they're
> going.  It isn't clear to me where this kind of intrusion is
> happening.
> 
> In the case of ip_mloopback(), somehow we are stomping on a
> read-only copy of an mbuf chain. The use of m_copy() with m_pullup()
> there is fine according to the documented uses of mbuf(9), although
> as Luigi pointed out, most likely we need to look at the upper-layer
> protocol too, e.g.  where UDP checksums are also being offloaded.
> 
> Some of the code in the IGMPv3 branch actually reworks how loopback
> happens i.e. the preference is not to loop back wherever possible
> because of the locking implications. Check the bms_netdev branch
> history for more info.


Well, what I suspect is the problem are these bits:

udp_output():

	/*
	 * Set up checksum and output datagram.
	 */
	if (udp_cksum) {
		if (inp->inp_flags & INP_ONESBCAST)
			faddr.s_addr = INADDR_BROADCAST;
		ui->ui_sum = in_pseudo(ui->ui_src.s_addr, faddr.s_addr,
		    htons((u_short)len + sizeof(struct udphdr) + IPPROTO_UDP));
		m->m_pkthdr.csum_flags = CSUM_UDP;
		m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum);
	} else

ip_mloopback():


	copym = m_copy(m, 0, M_COPYALL);
	if (copym != NULL && (copym->m_flags & M_EXT || copym->m_len < hlen))
		copym = m_pullup(copym, hlen);
	if (copym != NULL) {
		/* If needed, compute the checksum and mark it as valid. */
		if (copym->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
			in_delayed_cksum(copym);
			copym->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
			copym->m_pkthdr.csum_flags |=
			    CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
			copym->m_pkthdr.csum_data = 0xffff;
		}

and:

in_delayed_cksum(struct mbuf *m)
{
	struct ip *ip;
	u_short csum, offset;

	ip = mtod(m, struct ip *);
	offset = ip->ip_hl << 2 ;
	csum = in_cksum_skip(m, ip->ip_len, offset);
	if (m->m_pkthdr.csum_flags & CSUM_UDP && csum == 0)
		csum = 0xffff;
	offset += m->m_pkthdr.csum_data;	/* checksum offset */


Somehow the data that the device needs to do the proper checksum
offload is getting trashed here.  Now, since it's clear we need a
writable packet structure so that we don't trash the original, I'm
wondering if the m_pullup() will be sufficient.

Best,
George


More information about the freebsd-net mailing list