MQ Patch.
Andre Oppermann
andre at freebsd.org
Tue Oct 29 21:25:56 UTC 2013
On 29.10.2013 22:03, Navdeep Parhar wrote:
> On 10/29/13 13:41, Andre Oppermann wrote:
>> Let me jump in here and explain roughly the ideas/path I'm exploring
>> in creating and eventually implementing a big picture for drivers,
>> queues, queue management, various QoS and so on:
>>
>> Situation: We're still mostly based on the old 4.4BSD IFQ model with
>> a couple of work-arounds (sndring, drbr) and the bit-rotten ALTQ we
>> have in tree aren't helpful at all.
>>
>> Steps:
>>
>> 1. take the soft-queuing method out of the ifnet layer and make it
>> a property of the driver, so that the upper stack (or actually
>> protocol L3/L2 mapping/encapsulation layer) calls (*if_transmit)
>> without any queuing at that point. It then is up to the driver
>> to decide how it multiplexes multi-core access to its queue(s)
>> and how they are configured.
>
> It would work out much better if the kernel was aware of the number of
> tx queues of a multiq driver and explicitly selected one in if_transmit.
> The driver has no information on the CPU affinity etc. of the
> applications generating the traffic; the kernel does. In general, the
> kernel has a much better "global view" of the system and some of the
> stuff currently in the drivers really should move up into the stack.
I've been thinking a lot about this and come to the preliminary conclusion
that the upper stack should not tell the driver which queue to use. There
are way to many possible and depending on the use-case, better or worse
performing approaches. Also we have a big problem with cores vs. queues
mismatches either way (more cores than queues or more queues than cores,
though the latter is much less of problem).
For now I see these primary multi-hardware-queue approaches to be implemented
first:
a) the drivers (*if_transmit) takes the flowid from the mbuf header and
selects one of the N hardware DMA rings based on it. Each of the DMA
rings is protected by a lock. Here the assumption is that by having
enough DMA rings the contention on each of them will be relatively low
and ideally a flow and ring sort of sticks to a core that sends lots
of packets into that flow. Of course it is a statistical certainty that
some bouncing will be going on.
b) the driver assigns the DMA rings to particular cores which by that, through
a critnest++ can drive them lockless. The drivers (*if_transmit) will look
up the core it got called on and push the traffic out on that DMA ring.
The problem is the actual upper stacks affinity which is not guaranteed.
This has to consequences: there may be reordering of packets of the same
flow because the protocols send function happens to be called from a
different core the second time. Or the drivers (*if_transmit) has to
switch to the right core to complete the transmit for this flow if the
upper stack migrated/bounced around. It is rather difficult to assure
full affinity from userspace down through the upper stack and then to
the driver.
c) non-multi-queue capable hardware uses a kernel provided set of functions
to manage the contention for the single resource of a DMA ring.
The point here is that the driver is the right place to make these decisions
because the upper stack lacks (and shouldn't care about) the actual available
hardware and its capabilities. All necessary information is available to the
driver as well through the appropriate mbuf header fields and the core it is
called on.
--
Andre
More information about the freebsd-net
mailing list