fragmentation problem in FreeBSD 7
Sebastian Kuzminsky
seb at lineratesystems.com
Tue Oct 23 18:26:00 UTC 2012
Hi folks, this is my first post to freebsd-net, and my first bug-fix
submission... I hope this is the right mailing list for this issue, and
the right format for sending in patches....
I'm working on a derivative of FreeBSD 7.
I've run into a problem with IP header checksums when fragmenting to an
e1000 (em) interface, and I've narrowed it down to a very simple test. The
test setup is like this:
[computer A]---(network 1)---[computer B]---(network 2)---[computer C]
That gorgeous drawing shows computer A connected to computer B via network
1, and computer B connected to computer C via network 2. Computer B is set
up to forward packets between networks 1 and 2. A can see B but not C. C
can see B but not A. B forwards between A and C. Pretty simple.
One of B's NICs is a Broadcom, handled by the bce driver; this one works
fine in all my testing.
B's other NIC is an Intel PRO/1000 handled by the em driver. This is the
one giving me trouble.
The test disables PMTUD on all three hosts. It then sets the MTU of the
bce and em interfaces to the unrealistically low value of 72 bytes, and
tries to pass TCP packets back and forth using nc on computers A and C
(with computer B acting as a gateway). This is to force the B gateway to
fragment the TCP frames it forwards.
Receiving on the em and sending on the bce works just fine (as noted
above). Small TCP frames that fit in the MTU, big TCP frames that get
fragmented, no problems.
Receiving on the bce and sending on the em interface works fine for small
TCP frames that don't need fragmentation, but when B has to fragment the IP
packets before sending them out the em, the IP header checksums in the IP
packets that appear on the em's wires are wrong. I came to this conclusion
by packet capture and by watching the 'bad header checksums' counter of
'netstat -s -p ip', both running on the computer receiving the fragments.
Ok, those are all my observations, next comes thoughts about the cause & a
proposed fix.
The root of the problem is two-fold:
1. ip_output.c:ip_fragment() does not clear the CSUM_IP flag in the mbuf
when it does software IP checksum computation, so the mbuf still looks like
it needs IP checksumming.
2. The em driver does not advertise IP checksum offloading, but still
checks the CSUM_IP flag in the mbuf and modifies the packet when that flag
is set (this is in em_transmit_checksum_setup(), called by em_xmit()).
Unfortunately the em driver gets the checksum wrong in this case, i guess
that's why it doesn't advertise this capability in its if_hwassist!
So the fragments that ip_fastfwd.c:ip_fastforward() gets from
ip_output.c:ip_fragment() have ip->ip_sum set correctly, but the
mbuf->m_pkthdr.csum_flags incorrectly has CSUM_IP still set, and this
causes the em driver to emit incorrect packets.
There are some other callers of ip_fragment(), notably ip_output().
ip_output() clears CSUM_IP in the mbuf csum_flags itself if it's not in
if_hwassist, so avoids this problem.
So, the fix is simple: clear the mbuf's CSUM_IP when computing ip->ip_sum
in ip_fragment(). The first attached patch (against
gitorious/svn_stable_7) does this.
In looking at this issue, I noticed that ip_output()'s use of sw_csum is
inconsistent. ip_output() splits the mbuf's csum_flags into two parts: the
stuff that hardware will assist with (these flags get left in the mbuf) and
the stuff that software needs to do (these get moved to sw_csum). But
later ip_output() calls functions that don't get sw_csum, or that don't
know to look in it and look in the mbuf instead. My second patch fixes
these kinds of issues and (IMO) simplifies the code by leaving all the
packet's checksumming needs in the mbuf, getting rid of sw_csum entirely.
--
Sebastian Kuzminsky
Linerate Systems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Update-the-mbuf-csum_flags-of-IP-fragments-when-comp.patch
Type: application/octet-stream
Size: 4620 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20121023/978ec910/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Simplify-the-tracking-of-mbuf-checksumming-needs.patch
Type: application/octet-stream
Size: 8913 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20121023/978ec910/attachment-0001.obj>
More information about the freebsd-net
mailing list