cvs commit: src/sys/dev/bge if_bge.c
Robert Watson
rwatson at FreeBSD.org
Sat Dec 23 15:19:30 PST 2006
On Sun, 24 Dec 2006, Oleg Bulyzhin wrote:
>>> We currently make this a lot worse than it needs to be by handing off the
>>> received packets one at a time, unlocking and relocking for every packet.
>>> It would be better if the driver's receive interrupt handler would harvest
>>> all of the incoming packets and queue them locally. Then, at the end, hand
>>> off the linked list of packets to the network stack wholesale, unlocking
>>> and relocking only once. (Actually, the list could probably be handed off
>>> at the very end of the interrupt service routine, after the driver has
>>> already dropped its lock.) We wouldn't even need a new primitive, if
>>> ether_input() and the other if_input() functions were enhanced to deal
>>> with a possible list of packets instead of just a single one.
>>
>> I try this experiement every few years, and generally don't measure much
>> improvement. I'll try it again with 10gbps early next year once back in
>> the office again. The more interesting transition is between the link
>> layer and the network layer, which is high on my list of topics to look
>> into in the next few weeks. In particular, reworking the ifqueue handoff.
>> The tricky bit is balancing latency, overhead, and concurrency...
>>
>> FYI, there are several sets of patches floating around to modify if_em to
>> hand off queues of packets to the link layer, etc. They probably need
>> updating, of course, since if_em has changed quite a bit in the last year.
>> In my implementaiton, I add a new input routine that accepts mbuf packet
>> queues.
>
> I'm just curious, do you remember average length of mbuf queue in your
> tests? While experimenting with bge(4) driver (taskqueue, interrupt
> moderation, converted bge_rxeof() to above scheme), i've found it's quite
> easy to exhaust available mbuf clusters under load (trying to queue
> hundreids of received packets). So i had to limit rx queue to rather low
> length.
Off-hand, I don't remember. I do remember it being very important to maintain
bounds on the size of in-flight packet sets at all levels in the stack -- for
the same reason the netisr dispatch queue is bounded. Otherwise if the device
is able to keep the device driver entirely busy, you'll effectively live-lock
since you never dispatch to the next layer, exhaust available memory, etc,
etc. One of the ideas I've been futzing with is "back-pressure" across the
netisr and a "checkout" model in which the total length of the queue spanning
device driver and dispatch through to the protocol has a total bound with
reservations taken by components as they process sets of packets. In this
way, the ithread would know the netisr was already in execution and not
perform a wakeup (and getting involved in the scheduler), avoid excessive
memory consumption, etc. Ed Maste has also suggested changing our notion of
mbuf packet queues, as our current queue model requires following linked
lists, which make inefficient use of of CPU caches, and instead using arrays
of mbuf pointers. I've done a bit of experimentation along these lines, but
not enough to investigate the properties well.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the cvs-src
mailing list