Driver patch to look at...
Randy Stewart
randall at lakerest.net
Tue Feb 5 10:49:02 UTC 2013
On Feb 4, 2013, at 3:24 PM, John Baldwin wrote:
> On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote:
>> All:
>>
>> I have been working with TCP in gigabit networks (igb driver actually) and have
>> found a very nasty problem with the way the driver is doing its put back when
>> it fills the out-bound transmit queue.
>>
>> Basically it has taken a packet from the head of the ring buffer, and then
>> realizes it can't fit it into the transmit queue. So it just re-enqueue's it
>> into the ring buffer. Whats wrong with that? Well most of the time there
>> are anywhere from 10-50 packets (maybe more) in that ring buffer when you are
>> operating at full speed (or trying to). This means you will see 10 duplicate
>> ACKs, do a fast retransmit and cut your cwnd in half.. not very nice actually.
>>
>> The patch I have attached makes it so that
>>
>> 1) There are ways to swap back.
>> 2) Use the peek in the ring buffer and only
>> dequeue the packet if we put it into the transmit ring
>> 3) If something goes wrong and the transmit frees the packet we dequeue it.
>> 4) If the transmit changed it (defrag etc) then swap out the new mbuf that
>> has taken its place.
>>
>> I have fixed the four intel drivers that had this systemic issue, but there
>> are still more to fix.
>>
>> Comments/review .. rotten egg's etc.. would be most welcome before
>> I commit this..
>
> Does this only happen in drivers that use buffering?
Yep, there are a lot of drivers that *do not* use the drbr_xxxx() functions and
for those they do the IFQ_DRV_PREPEND().. its only the newer drivers like the
intel 1Gig and 10Gig ones that seem effected
Also effected are :
bxe
cxgb
oce
en
I have not fixed those yet.
> I seem to recall that
> drivers using IFQ would just stuff the packet at the head of the IFQ via
> IFQ_DRV_PREPEND() in this case so it is still the next packet to transmit.
> See, for example, this bit in dc_start_locked():
>
> for (queued = 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) {
> /*
> * If there's no way we can send any packets, return now.
> */
> if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - DC_TX_LIST_RSVD) {
> ifp->if_drv_flags |= IFF_DRV_OACTIVE;
> break;
> }
> IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
> if (m_head == NULL)
> break;
>
> if (dc_encap(sc, &m_head)) {
> if (m_head == NULL)
> break;
> IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
> ifp->if_drv_flags |= IFF_DRV_OACTIVE;
> break;
> }
>
> It sounds like drbr/buf_ring just don't handle this case correctly? It
> seems a shame to have to duplicate so much code in the various drivers to
> fix this, but that seems to be par for the course when using buf_ring. :(
> (buggy in edge cases and lots of duplicated code that is).
> Also, doing the drbr_swap() just so that drbr_dequeue() returns what you
> just swapped in seems... odd. It seems that it would be nicer instead
> to have some sort of drbr_peek() / drbr_advance() where the latter just
> skips over whatever the current head is? Then you could have something
> like:
>
> while ((next = drbr_peek()) != NULL) {
> if (!foo_encap(&next)) {
> if (next == NULL)
> drbr_advance();
> break;
> }
> drbr_advance();
> }
>
That was what I originally did (without the rename), but there is a sure crash waiting in that.
The only difference from what I originally had was just drbr_dequeue().. but
I was being a bit lazy and not wanting to add yet another function to the
drbr_xxxx code since essential it would be a clone of drbr_dequeue() without
returning the mbuf.
The crash potential here is in that foo_encap(&next) often times will return
a different mbuf (at least in the igb driver it does). I believe its due
to either the m_pullup() which could change the lead mbuf you want
in the drbr_ring, or the m_defrag all within igb_xmit. Thus you have
to track what comes back from the !foo_encap() call and compare it to
see if you must swap it out.
> I guess the sticky widget is the case of ATLQ as you need to dequeue the
> packet always in the ALTQ case and put it back if the encap fails.
Yeah ALTQ is ugly and IMO we need to re-write it anyway.. maybe re-think
this whole layer ;-0
> For
> your patch it's not clear to me how that works. It seems that if the
> encap routine frees the mbuf you try to dereference a freed pointer when
> you call drbr_dequeue().
Hmm you are right.. I forgot how we keep those using the mbuf itself...
> I really think you will instead need some sort
> of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case and
> use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case.
We could do that but drbr_putback() would probably need both the old
and new pointers and then we could make it do the ring_swap() to put
the right mbuf in place..
Let me go explore that and come up with a better patch ;-)
R
>
> --
> John Baldwin
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
-----
Randall Stewart
randall at lakerest.net
More information about the freebsd-net
mailing list