ULE process to resolution

From: Jeff Roberson <jroberson_at_jroberson.net>
Date: Fri, 31 Mar 2023 19:43:10 UTC
Hi Folks,

For those who don't know, I am the original author of ULE.  I have not had 
much time for FreeBSD in recent years but this thread was forwarded to me 
and I am dishearetened at the state of things.  I will give my perspective 
and propose a path to resolve this systematically.

The fundamental benefit of ULE is also the fundamental challenge, That is: 
N cpu local decisions need to add up to a reasonable approximation of a 
correct global decision.  This is necessary to scale to large core counts, 
large thread counts, and preserve some affinity.  You could permute 4BSD 
further towards these goals but I posit that you would simply have to work 
through the same bugs.

As I read these threads I can state with a high degree of confidence that 
many of these tests worked with superior results with ULE at one time. 
It may be that tradeoffs have changed or exposed weaknesses, it may also 
be that it's simply been broken over time.  I see a large number of 
commits intended to address point issues and wonder whether we adequately 
explored the consquences.  Indeed I see solutions involving tunables 
proposed here that will definitively break other cases.

I know that CPU tradeoffs have changed.  ULE was written in a way that the 
topology could be annotated and cost of migration can be specified.  It is 
adaptable to this but someone has to put in the effort.  The cost function 
was written in ticks which does not scale down properly and accurate cpu 
tick counters could now be used for more precise time-keeping for more 
specific affinity.  Over time people have also added additional searches 
to pickcpu which don't scale well to very high core count systems.  NUMA 
and heterogeneous CPUs are also possible in the graph framework but need 
further investment.

The other thing that has changed over time is the ability of the 
interactivity score to correctly detect truely interactive applications. 
When I wrote it you could do a buildworld on a single core or small 
multi-core system and play mp3s and browse the web without a hiccup. 
However, web browsers have evolved to be significantly more resource 
intensive.  I'm not sure a heuristic can or should catch this case. 
We're probably long overdue to add x window focus hints as most other 
operating systems do.  I don't think tossing the interactivity score is 
really going to produce the desired results.  Linux CFS disagrees with me 
but I have always been able to achieve superior responsiveness with ULE. 
My intuition is that with an x window focus hint we could dial back the 
interactive threshold and have better tradeoffs with the soft real-time 
score.

schedgraph is also no longer adequate for modern systems.  In my 
professional life I have taken the same types of data sources and built 
text based processes on top because graphical representations just can't 
scale to the number of events and cores for full system scheduling.  For 
complex scheduling issues you need detailed introspection.  You're not 
going to tweak variables and run buildworlds to arrive at success by 
supposition with any kind of reasonable velocity.

The first step to resolving this is to come up with a list of regression 
tests and catalog how they behave compared to 4BSD.  When I wrote the 
scheduler I also wrote a simple fixed duty cycle program that could be run 
with different scheduling parameters and report on its cpu usage and 
latency.  Combining many copies of this program you can simulate various 
kinds of interactions.  It is available at 
people.freebsd.org/~jeff/late.tgz.  I know there is also a linux scheduler 
benchmark that may be worth porting.

If someone would start making regression tests I am happy to fix bugs or 
review bug fixes.  Personally I would start from fairness given different 
nice values on a single CPU, and then multi-cpu.  Evaluate allocation with 
variation on load to core count ratios.  It should not take a few hours to 
iterate through the interesting cases here before going on to more complex 
questions about buildworld or firefox etc.  This would need to be 
something we carried forward in the source tree and ask people to re-run 
as part of scheduler CRs or we're just going to find ourselves back in 
this spot again.

I also have a backlog of improvements for large multi-core systems from 
work I did years ago that have not made it into the tree.  And I have an 
old review for patches to improve the reliability of priority in causing 
scheduling events that may be germane.  If we can collaborate on a testing 
framework I could trickle these in.

Thanks,
Jeff