RFC: bus_get_cpus(9)
John Baldwin
jhb at freebsd.org
Thu Feb 19 14:46:43 UTC 2015
One of the next steps for NUMA device-awareness is a way to let drivers know
which CPUs are ideal to use for interrupts (and in particular this is targeted
at multiqueue NICs that want to create a TX/RX ring pair per CPU). However,
for modern Intel systems at least, it is usually best to use CPUs from the
physical processor package that contains the I/O hub that a device connects to
(e.g. to allow DDIO to work).
The PoC API I came up with is a new bus method called bus_get_cpus() that
returns a requested cpuset for a given device. It accepts an enum for the
second parameter that says the type of cpuset being requested. Currently two
valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the
device when NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For a NIC driver the expectation is that the driver will call
'bus_get_cpus(dev, INTR_CPUS, &set)' and create queues for each of the CPUs in
'set'. (In my current patchset I have updated igb(4) to use this approach.)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS is mapped to 'all_cpus' by default in the 'root_bus'
driver. INTR_CPUS is also mapped to 'all_cpus' by default.
The x86 interrupt code maintains its own set of interrupt CPUs which this
patch now exposes via INTR_CPUS in the x86 nexus driver.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and NUMA is enabled. They also and the global
INTR_CPUS set from the nexus driver with the per-domain set from _PXM to
generate a local INTR_CPUS set for child devices.
The current patch can be found here:
https://github.com/bsdjhb/freebsd/compare/bsdjhb:master...numa_bus_get_cpus
It includes a few other fixes besides the implementation of bus_get_cpu() (and
some things have already been committed such as
taskqueue_start_threads_cpuset() and CPU_COUNT()):
- It fixes the x86 interrupt code to exclude modern SMT threads from the
default interrupt set. (Previously only Pentium 4-era HTT threads were
excluded.)
- It has a sample conversion of igb(4) to this interface (albeit ugly using
#if's).
Longer term I think I would like to make the INTR_CPUS thing a bit more
formal. In particular, Solaris allows you to alter the set of CPUs that
handle interrupts via prctl (or a tool named something close to that). I
think I would like to have a dedicated global cpuset for that (but not named
"2", it would be a new WHICH level). That would allow userland to use cpuset
to alter the set of CPUs that handle interrupts in case you wanted to use SMT
for example. I think if we do this that all ithreads would have their cpusets
hang off of this set instead of the root set (which would also remove some of
the recent special case handling for ithreads I believe). The one uglier part
about this is that we should probably then have a way to notify drivers that
INTR_CPUS changed so that they could try to cope gracefully. I think that's a
bit of a longer horizon thing, but for now I think bus_get_cpus() is a good
next step.
What do other folks think? (And yes, I know it needs a manpage before it goes
in, but I'd rather get the API agreed on before polishing that.)
--
John Baldwin
More information about the freebsd-arch
mailing list