Re: support for asymmetric CPUs

From: Stefan Esser <se_at_FreeBSD.org>
Date: Tue, 01 Mar 2022 21:05:28 UTC
Am 01.03.22 um 20:04 schrieb Mike Karels:
> Has anyone been looking at scheduling issues for asymmetric CPUs like
> those with performance cores and efficiency cores?  I've been looking
> at this a little, with Intel's Alder Lake as an example.  E.g. the
> i7-12700 has 8 performance cores with SMT (hyperthreading) and 4
> efficiency cores without SMT.  The E-cores are supposedly better for
> threaded processes.  Intel also has a hardware/firmware facility to
> advise the OS about process behavior to guide placement.  I don't know
> much about this yet, but there is supposedly support pending for Linux.
> Looking ahead, the Apple M1 also has asymmetric CPUs with P-cores and
> E-cores.  (Does FreeBSD support any ARM chips with asymmetric CPUs yet?)
> 
> It seems clear that there should be a generalized interface that supports
> machine-dependent configuration, even if the hooks mostly end up pointing
> to machine-independent routines in the scheduler in common cases.  I'd
> envision initial support that just looked at CPU usage and adjusted the
> cpusets for threads that were using the default cpuset.  I was also
> thinking about exposing cpusets of P-cores and E-cores for use by
> knowledgeable user processes.  I'm not sure whether it makes sense to
> try to generalize beyond one dimension, higher-performance and lower-
> performance cores, e.g. for vector-heavy processes or other potential
> asymmetric capabilities.  I'm not sure how to generalize more, so that
> could be a future exercise if there was a reason for it.
> 
> If anyone has thought about this or has done any work on it, I'd be
> interested to hear about it.

Not identical to big/little scheduling, but IMHO related:

If the scheduler is improved to support asymmetric CPUs then I'd
think that more intelligent handling of SMP cores should also be
considered.

If a SMP-capable core executes only one thread, it can be considered
to operate at a nominal clock rate (100%). With 2 simultaneous threads
there are 2 virtual cores of in the order of 60% clock rate (each).

Therefore a single SMP-capable core could be considered to dynamically
switch between being 1 P-core or 2 E-cores.

I'd think that it is not required to have a high-precision estimate
of the relative performance of P-cores vs. E-cores (which also may
have individual and dynamically clock multipliers depending on the
temperature, load, and other parameters).

But the topology (especially with regard to significant cache latencies)
and the rough performance level (e.g. 100% vs. 60%) of each core should
be reflected in the scheduler logic.

Regards, STefan