kern/167325: [netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC

Andre Oppermann oppermann at networx.ch
Wed Sep 12 06:07:10 UTC 2012


On 07.09.2012 23:44, Jeremiah Lott wrote:
> On Apr 27, 2012, at 2:07 AM, linimon at FreeBSD.org wrote:
>
>> Old Synopsis: sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC New Synopsis:
>> [netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC
>
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=167325
>
> I did an analysis of this pr a while back and I figured I'd share.  Definitely looks like a real
> problem here, but at least in 8.2 it is difficult to hit it.  First off, vlan tagging is not
> required to hit this.  The code is question does not account for any amount of link-local header,
> so you can reproduce the bug even without vlans.
>
> In order to trigger it, the tcp stack must choose to send a tso "packet" with a total size
> (including tcp+ip header and options, but not link-local header) between 65522 and 65535 bytes
> (because adding 14 byte link-local header will then exceed 64K limit).  In 8.1, the tcp stack
> only chooses to send tso bursts that will result in full mtu-size on-wire packets.  To achieve
> this, it will truncate the tso packet size to be a multiple of mss, not including header and tcp
> options.  The check has been relaxed a little in head, but the same basic check is still there.
> None of the "normal" mtus have multiples falling in this range.  To reproduce it I used an mtu of
> 1445.  When timestamps are in use, every packet has a 40 bytes tcp/ip header + 10 bytes for the
> timestamp option + 2 bytes pad.  You can get a packet length 65523 as follows:
>
> 65523 - (40 + 10 + 2) = 65471 (size of tso packet data) 65471 / 47 = 1393 (size of data per
> on-wire packet) 1393 + (40 + 10 + 2) = 1445 (mtu is data + header + options + pad)
>
> Once you set your mtu to 1445, you need a program that can get the stack to send a maximum sized
> packet.  With the congestion window that can be more difficult than it seems.  I used some python
> that sends enough data to open the window, sleeps long enough to drain all outstanding data, but
> not long enough for the congestion window to go stale and close again, then sends a bunch more
> data.  It also helps to turn off delayed acks on the receiver.  Sometimes you will not drain the
> entire send buffer because an ack for the final chunk is still delayed when you start the second
> transmit.  When the problem described in the pr hits, the EINVAL from bus_dmamap_load_mbuf_sg
> bubbles right up to userspace.
>
> At first I thought this was a driver bug rather than stack bug.  The code in question does what
> it is commented to do (limit the tso packet so that ip->ip_len does not overflow).  However, it
> also seems reasonable that the driver limit its dma tag at 64K (do we really want it allocating
> another whole page just for the 14 byte link-local header).  Perhaps the tcp stack should ensure
> that the tso packet + max_linkhdr is < 64K.  Comments?

Thank you for the analysis.  I'm looking into it.

> As an aside, the patch attached to the pr is also slightly wrong.  Taking the max_linkhdr into
> account when rounding the packet to be a multiple of mss does not make sense, it should only take
> it into account when calculating the max tso length.

-- 
Andre


More information about the freebsd-net mailing list