kern/167325: [netinet] [patch] sosend sometimes return EINVAL
with TSO and VLAN on 82599 NIC
Andre Oppermann
oppermann at networx.ch
Wed Sep 12 06:07:10 UTC 2012
On 07.09.2012 23:44, Jeremiah Lott wrote:
> On Apr 27, 2012, at 2:07 AM, linimon at FreeBSD.org wrote:
>
>> Old Synopsis: sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC New Synopsis:
>> [netinet] [patch] sosend sometimes return EINVAL with TSO and VLAN on 82599 NIC
>
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=167325
>
> I did an analysis of this pr a while back and I figured I'd share. Definitely looks like a real
> problem here, but at least in 8.2 it is difficult to hit it. First off, vlan tagging is not
> required to hit this. The code is question does not account for any amount of link-local header,
> so you can reproduce the bug even without vlans.
>
> In order to trigger it, the tcp stack must choose to send a tso "packet" with a total size
> (including tcp+ip header and options, but not link-local header) between 65522 and 65535 bytes
> (because adding 14 byte link-local header will then exceed 64K limit). In 8.1, the tcp stack
> only chooses to send tso bursts that will result in full mtu-size on-wire packets. To achieve
> this, it will truncate the tso packet size to be a multiple of mss, not including header and tcp
> options. The check has been relaxed a little in head, but the same basic check is still there.
> None of the "normal" mtus have multiples falling in this range. To reproduce it I used an mtu of
> 1445. When timestamps are in use, every packet has a 40 bytes tcp/ip header + 10 bytes for the
> timestamp option + 2 bytes pad. You can get a packet length 65523 as follows:
>
> 65523 - (40 + 10 + 2) = 65471 (size of tso packet data) 65471 / 47 = 1393 (size of data per
> on-wire packet) 1393 + (40 + 10 + 2) = 1445 (mtu is data + header + options + pad)
>
> Once you set your mtu to 1445, you need a program that can get the stack to send a maximum sized
> packet. With the congestion window that can be more difficult than it seems. I used some python
> that sends enough data to open the window, sleeps long enough to drain all outstanding data, but
> not long enough for the congestion window to go stale and close again, then sends a bunch more
> data. It also helps to turn off delayed acks on the receiver. Sometimes you will not drain the
> entire send buffer because an ack for the final chunk is still delayed when you start the second
> transmit. When the problem described in the pr hits, the EINVAL from bus_dmamap_load_mbuf_sg
> bubbles right up to userspace.
>
> At first I thought this was a driver bug rather than stack bug. The code in question does what
> it is commented to do (limit the tso packet so that ip->ip_len does not overflow). However, it
> also seems reasonable that the driver limit its dma tag at 64K (do we really want it allocating
> another whole page just for the 14 byte link-local header). Perhaps the tcp stack should ensure
> that the tso packet + max_linkhdr is < 64K. Comments?
Thank you for the analysis. I'm looking into it.
> As an aside, the patch attached to the pr is also slightly wrong. Taking the max_linkhdr into
> account when rounding the packet to be a multiple of mss does not make sense, it should only take
> it into account when calculating the max tso length.
--
Andre
More information about the freebsd-net
mailing list