Re: ULE process to resolution

From: Jeff Roberson <jroberson_at_jroberson.net>
Date: Fri, 31 Mar 2023 21:33:16 UTC
I found an old patch of mine that addresses some of the issues with rapid 
sleeping/waking batch processes here:

https://reviews.freebsd.org/D15985

Seems there are some bits relevant to behavior described earlier on 
hackers@.  I was not subscribed to this list so I can't reply to the 
specific message.

Jeff

On Fri, 31 Mar 2023, Jeff Roberson wrote:

> Hi Folks,
>
> For those who don't know, I am the original author of ULE.  I have not had 
> much time for FreeBSD in recent years but this thread was forwarded to me and 
> I am dishearetened at the state of things.  I will give my perspective and 
> propose a path to resolve this systematically.
>
> The fundamental benefit of ULE is also the fundamental challenge, That is: N 
> cpu local decisions need to add up to a reasonable approximation of a correct 
> global decision.  This is necessary to scale to large core counts, large 
> thread counts, and preserve some affinity.  You could permute 4BSD further 
> towards these goals but I posit that you would simply have to work through 
> the same bugs.
>
> As I read these threads I can state with a high degree of confidence that 
> many of these tests worked with superior results with ULE at one time. It may 
> be that tradeoffs have changed or exposed weaknesses, it may also be that 
> it's simply been broken over time.  I see a large number of commits intended 
> to address point issues and wonder whether we adequately explored the 
> consquences.  Indeed I see solutions involving tunables proposed here that 
> will definitively break other cases.
>
> I know that CPU tradeoffs have changed.  ULE was written in a way that the 
> topology could be annotated and cost of migration can be specified.  It is 
> adaptable to this but someone has to put in the effort.  The cost function 
> was written in ticks which does not scale down properly and accurate cpu tick 
> counters could now be used for more precise time-keeping for more specific 
> affinity.  Over time people have also added additional searches to pickcpu 
> which don't scale well to very high core count systems.  NUMA and 
> heterogeneous CPUs are also possible in the graph framework but need further 
> investment.
>
> The other thing that has changed over time is the ability of the 
> interactivity score to correctly detect truely interactive applications. When 
> I wrote it you could do a buildworld on a single core or small multi-core 
> system and play mp3s and browse the web without a hiccup. However, web 
> browsers have evolved to be significantly more resource intensive.  I'm not 
> sure a heuristic can or should catch this case. We're probably long overdue 
> to add x window focus hints as most other operating systems do.  I don't 
> think tossing the interactivity score is really going to produce the desired 
> results.  Linux CFS disagrees with me but I have always been able to achieve 
> superior responsiveness with ULE. My intuition is that with an x window focus 
> hint we could dial back the interactive threshold and have better tradeoffs 
> with the soft real-time score.
>
> schedgraph is also no longer adequate for modern systems.  In my professional 
> life I have taken the same types of data sources and built text based 
> processes on top because graphical representations just can't scale to the 
> number of events and cores for full system scheduling.  For complex 
> scheduling issues you need detailed introspection.  You're not going to tweak 
> variables and run buildworlds to arrive at success by supposition with any 
> kind of reasonable velocity.
>
> The first step to resolving this is to come up with a list of regression 
> tests and catalog how they behave compared to 4BSD.  When I wrote the 
> scheduler I also wrote a simple fixed duty cycle program that could be run 
> with different scheduling parameters and report on its cpu usage and latency. 
> Combining many copies of this program you can simulate various kinds of 
> interactions.  It is available at people.freebsd.org/~jeff/late.tgz.  I know 
> there is also a linux scheduler benchmark that may be worth porting.
>
> If someone would start making regression tests I am happy to fix bugs or 
> review bug fixes.  Personally I would start from fairness given different 
> nice values on a single CPU, and then multi-cpu.  Evaluate allocation with 
> variation on load to core count ratios.  It should not take a few hours to 
> iterate through the interesting cases here before going on to more complex 
> questions about buildworld or firefox etc.  This would need to be something 
> we carried forward in the source tree and ask people to re-run as part of 
> scheduler CRs or we're just going to find ourselves back in this spot again.
>
> I also have a backlog of improvements for large multi-core systems from work 
> I did years ago that have not made it into the tree.  And I have an old 
> review for patches to improve the reliability of priority in causing 
> scheduling events that may be germane.  If we can collaborate on a testing 
> framework I could trickle these in.
>
> Thanks,
> Jeff
>