fragmentation problem in FreeBSD 7

Gleb Smirnoff glebius at FreeBSD.org
Fri Oct 26 13:53:57 UTC 2012


  Sebastian,

On Tue, Oct 23, 2012 at 12:25:59PM -0600, Sebastian Kuzminsky wrote:
S> The root of the problem is two-fold:
S> 
S> 1.  ip_output.c:ip_fragment() does not clear the CSUM_IP flag in the mbuf
S> when it does software IP checksum computation, so the mbuf still looks like
S> it needs IP checksumming.
S> 
S> 2. The em driver does not advertise IP checksum offloading, but still
S> checks the CSUM_IP flag in the mbuf and modifies the packet when that flag
S> is set (this is in em_transmit_checksum_setup(), called by em_xmit()).
S>  Unfortunately the em driver gets the checksum wrong in this case, i guess
S> that's why it doesn't advertise this capability in its if_hwassist!
S> 
S> So the fragments that ip_fastfwd.c:ip_fastforward() gets from
S> ip_output.c:ip_fragment() have ip->ip_sum set correctly, but the
S> mbuf->m_pkthdr.csum_flags incorrectly has CSUM_IP still set, and this
S> causes the em driver to emit incorrect packets.
S> 
S> There are some other callers of ip_fragment(), notably ip_output().
S>  ip_output() clears CSUM_IP in the mbuf csum_flags itself if it's not in
S> if_hwassist, so avoids this problem.
S> 
S> So, the fix is simple: clear the mbuf's CSUM_IP when computing ip->ip_sum
S> in ip_fragment().  The first attached patch (against
S> gitorious/svn_stable_7) does this.
S> 
S> In looking at this issue, I noticed that ip_output()'s use of sw_csum is
S> inconsistent.  ip_output() splits the mbuf's csum_flags into two parts: the
S> stuff that hardware will assist with (these flags get left in the mbuf) and
S> the stuff that software needs to do (these get moved to sw_csum).  But
S> later ip_output() calls functions that don't get sw_csum, or that don't
S> know to look in it and look in the mbuf instead.  My second patch fixes
S> these kinds of issues and (IMO) simplifies the code by leaving all the
S> packet's checksumming needs in the mbuf, getting rid of sw_csum entirely.

Thanks for submission!

I'm about to commit the attached patch to head. Can you please review it?
Haven't I missed anything important?

I have tested it in a 3-box setup similar to yours, except the middle box
doesn't have em(4) NIC, it has igb(4). I've tested it with rxcsum/txcsum on
and off on both NICs.

I have also moved from CSUM_DELAY_IP to CSUM_IP. AFAIU, the alias CSUM_DELAY_IP
was made to match CSUM_DELAY_DATA. But to my point of view it makes it more
difficult to understand code, because a person reading code sees different
constants in the stack and in drivers. Since your change touches every line
in the stack, that utilizes CSUM_DELAY_IP, I decided to consistently use
CSUM_IP constant.

-- 
Totus tuus, Glebius.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: csum.diff
Type: text/x-diff
Size: 6858 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20121026/35a1cc46/attachment.diff>


More information about the freebsd-net mailing list