Small patch to multicast code...
gnn at FreeBSD.org
gnn at FreeBSD.org
Fri Aug 22 21:40:05 UTC 2008
At Fri, 22 Aug 2008 19:43:03 +0100,
Bruce M. Simpson wrote:
>
> We end up calling if_simloop() from a few "interesting" places, in
> particular the kernel PIM packet handler.
>
> In this particular case we're going to take a full mbuf chain copy every
> time we send a packet which needs to be looped back to userland.
Right, I know the penalty.
> It's been a while since I've done any in-depth FreeBSD work other
> than hacking on the IGMPv3 snap, and my time is largely tied up with
> other work these days, sadly.
>
> It doesn't seem right to my mind that we need to make a full copy of
> an mbuf chain with m_dup() to workaround this kind of problem.
>
> Whilst it may suffice for a band-aid workaround, we may see mbuf
> pool fragmentation as packet rates go up.
>
> However we are now in a "new world order" where mbuf chains may be
> very tied to the device where they've originated or to where they're
> going. It isn't clear to me where this kind of intrusion is
> happening.
>
> In the case of ip_mloopback(), somehow we are stomping on a
> read-only copy of an mbuf chain. The use of m_copy() with m_pullup()
> there is fine according to the documented uses of mbuf(9), although
> as Luigi pointed out, most likely we need to look at the upper-layer
> protocol too, e.g. where UDP checksums are also being offloaded.
>
> Some of the code in the IGMPv3 branch actually reworks how loopback
> happens i.e. the preference is not to loop back wherever possible
> because of the locking implications. Check the bms_netdev branch
> history for more info.
Well, what I suspect is the problem are these bits:
udp_output():
/*
* Set up checksum and output datagram.
*/
if (udp_cksum) {
if (inp->inp_flags & INP_ONESBCAST)
faddr.s_addr = INADDR_BROADCAST;
ui->ui_sum = in_pseudo(ui->ui_src.s_addr, faddr.s_addr,
htons((u_short)len + sizeof(struct udphdr) + IPPROTO_UDP));
m->m_pkthdr.csum_flags = CSUM_UDP;
m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum);
} else
ip_mloopback():
copym = m_copy(m, 0, M_COPYALL);
if (copym != NULL && (copym->m_flags & M_EXT || copym->m_len < hlen))
copym = m_pullup(copym, hlen);
if (copym != NULL) {
/* If needed, compute the checksum and mark it as valid. */
if (copym->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
in_delayed_cksum(copym);
copym->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
copym->m_pkthdr.csum_flags |=
CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
copym->m_pkthdr.csum_data = 0xffff;
}
and:
in_delayed_cksum(struct mbuf *m)
{
struct ip *ip;
u_short csum, offset;
ip = mtod(m, struct ip *);
offset = ip->ip_hl << 2 ;
csum = in_cksum_skip(m, ip->ip_len, offset);
if (m->m_pkthdr.csum_flags & CSUM_UDP && csum == 0)
csum = 0xffff;
offset += m->m_pkthdr.csum_data; /* checksum offset */
Somehow the data that the device needs to do the proper checksum
offload is getting trashed here. Now, since it's clear we need a
writable packet structure so that we don't trash the original, I'm
wondering if the m_pullup() will be sufficient.
Best,
George
More information about the freebsd-net
mailing list