[rfc] enumerating device / bus domain information
Warner Losh
imp at bsdimp.com
Fri Oct 10 03:54:02 UTC 2014
On Oct 8, 2014, at 5:12 PM, Adrian Chadd <adrian at FreeBSD.org> wrote:
> On 8 October 2014 12:07, Warner Losh <imp at bsdimp.com> wrote:
>>
>> On Oct 7, 2014, at 7:37 PM, Adrian Chadd <adrian at FreeBSD.org> wrote:
>>
>>> Hi,
>>>
>>> Right now we're not enumerating any NUMA domain information about devices.
>>>
>>> The more recent intel NUMA stuff has some extra affinity information
>>> for devices that (eventually) will allow us to bind kernel/user
>>> threads and/or memory allocation to devices to keep access local.
>>> There's a penalty for DMAing in/out of remote memory, so we'll want to
>>> figure out what counts as "Local" for memory allocation and perhaps
>>> constrain the CPU set that worker threads for a device run on.
>>>
>>> This patch adds a few things:
>>>
>>> * it adds a bus_if.m method for fetching the VM domain ID of a given
>>> device; or ENOENT if it's not in a VM domain;
>>
>> Maybe a default VM domain. All devices are in VM domains :) By default
>> today, we have only one VM domain, and that’s the model that most of the
>> code expects…
>
> Right, and that doesn't change until you compile in with num domains > 1.
The first part of the statement doesn’t change when the number of domains
is more than one. All devices are in a VM domain.
> Then, CPUs and memory have VM domains, but devices may or may not have
> a VM domain. There's no "default" VM domain defined if num domains >
> 1.
Please explain how a device cannot have a VM domain? For the
terminology I'm familiar with, to even get cycles to the device, you have to
have a memory address (or an I/O port). That memory address has to
necessarily map to some domain, even if that domain is equally sucky
to get to from all CPUs (as is the case with I/O ports). while there may
not be a “default” domain, by virtue of its physical location it has to have
one.
> The devices themselves don't know about VM domains right now, so
> there's nothing constraining things like IRQ routing, CPU set, memory
> allocation, etc. The isilon team is working on extending the cpuset
> and allocators to "know" about numa and I'm sure this stuff will fall
> out of whatever they're working on.
Why would the device need to know the domain? Why aren’t the IRQs,
for example, steered to the appropriate CPU? Why doesn’t the bus handle
allocating memory for it in the appropriate place? How does this “domain” tie
into memory allocation and thread creation?
> So when I go to add sysctl and other tree knowledge for device -> vm
> domain mapping I'm going to make them return -1 for "no domain.”
Seems like there’s too many things lumped together here. First off, how
can there be no domain. That just hurts my brain. It has to be in some
domain, or it can’t be seen. Maybe this domain is one that sucks for everybody
to access, maybe it is one that’s fast for some CPU or package of CPUs to
access, but it has to have a domain.
> (Things will get pretty hilarious later on if we have devices that are
> "local" to two or more VM domains ..)
Well, devices aren’t local to domains, per se. Devices can communicate with
other components in a system at a given cost. One NUMA model is “near” vs “far”
where a single near domain exists and all the “far” resources are quite costly. Other
NUMA models may have a wider range of costs so that some resources are cheap,
others are a little less cheap, while others are down right expensive depending
on how far across the fabric of interconnects the messages need to travel. While
one can model this as a full 1-1 partitioning, that doesn’t match all of the extant
implementations, even today. It is easy, but an imperfect match to the underlying
realities in many cases (though a very good match to x86, which is mostly what
we care about).
Warner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20141009/d42c7f5d/attachment.sig>
More information about the freebsd-arch
mailing list