Re: Periodic rant about SCHED_ULE

From: Kevin Bowling <kevin.bowling_at_kev009.com>
Date: Thu, 30 Mar 2023 18:39:56 UTC
On Thu, Mar 30, 2023 at 11:29 AM Kevin Bowling <kevin.bowling@kev009.com> wrote:
>
> On Thu, Mar 30, 2023 at 8:37 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > I looked into it a little more, below you can find summary and steps forward.
> >
> > First a general statement: while ULE does have performance bugs, it
> > has better basis than 4BSD to make scheduling decisions. Most notably
> > it understands CPU topology, at least for cases which don't involve
> > big.LITTLE. For any non-freak case where 4BSD performs better, it is a
> > bug in ULE if this is for any reason other than a tradeoff which can
> > be tweaked to line them up. Or more to the point, there should not be
> > any legitimate reason to use 4BSD these days and modulo the bugs
> > below, you are probably losing on performance for doing so.
>
> An elided simple algorithm for big.LITTLE, from Larry McVoy.. if you
> run for an entire quantum, flag preference for big core.  If you run
> for less or get punted off, flag for little core preference.
>
> > Bugs reported in this thread by others and confirmed by me:
> > 1. failure to load-balance when having n CPUs and n + 1 workers -- the
> > excess one stays on one the same CPU thread continuously penalizing
> > the same victim. as a result total real time to execute a finite
> > computation is longer than in the case of 4BSD
> > 2. unfairness of nice -n 20 threads vs threads going frequently off
> > CPU (e.g., due to I/O) -- after using only a fraction of the slice the
> > victim has to wait for the cpu hog to use up its entire slice, rinse
> > and repeat. This extends a 7+ minute buildkernel to over 67 minutes,
> > not an issue on 4BSD
> >
> > I did not put almost any effort into investigating no 1. There is code
> > which is supposed to rebalance load across CPUs, someone(tm) will have
> > to sit through it -- for all I know the fix is trivial.
> >
> > Fixing number 2 makes *another* bug more acute and it complicates the
> > whole ordeal.
> >
> > Thus, bug reported by me:
> > 3. interactivity scoring is bogus -- originally introduced to detect
> > "interactive" behavior by equating being off CPU with waiting for user
> > input. One part of the problem is that it puts *all* non-preempted off
> > CPU time into one bag: a voluntary sleep. This includes suffering from
> > lock contention in the kernel, lock contention in the program itself,
> > file I/O and so on, none of which has bearing on how interactive or
> > not the program might happen to be. A bigger part of the problem is
> > that at least today, the graphical programs don't even act this way to
> > begin with -- they stay on CPU *a lot*.
> >
> > I asked people to provide me with the output of: dtrace -n
> > 'sched:::on-cpu { @[execname] = lquantize(curthread->td_priority, 0,
> > 224, 1); }' from their laptops/desktops.
> >
> > One finding is that most people (at least those who reported) use firefox.
> >
> > Another finding is that the browser is above the threshold which would
> > be considered "interactive" for vast majority of the time in all
> > reported cases.
> >
> > I booted a 2 thread vm with xfce and decided to click around. Spawned
> > firefox, opened a file manager (Thunar) and from there I opened a
> > movie to play with mpv. As root I spawned make -j 2 buildkernel. it
> > was not particularly good :)
> >
> > I found that mpv spawns a bunch of threads, most notably 2 distinct
> > threads for audio and video output. The one for video got a priority
> > of 175, while the rest had either 88 or 89 -- the lowest for
> > timesharing not considered interactive [note lower is considered
> > better].
> >
> > At the same time the file manager who was left in the background kept
> > doing evil syscall usage, which as a result bouncing between a regular
> > timesharing priority and one which made it "interactive", even though
> > the program was not touched for minutes.
> >
> > Or to put it differently, the scheduler failed to recognize that mpv
> > is the program to prioritize, all while thinking the background time
> > waster is the thing to look after (so to speak).
> >
> > This brings us to fixing problem 2: currently, due to the existence of
> > said problem, the interactivity scoring woes are less acute -- the
> > venerable make -j example is struggling to get CPU time, as a result
> > messing with real interactive programs to a lesser extent. If that
> > gets fixed, we are in a different boat altogether.
> >
> > I don't see a clean solution.

One other random anecdote.  Windows 11 uses window focus to highly
boost scheduling priority in an obviously effective way.  I have no
idea how difficult something like that would be to fit into the unix
world.

> > Right now I'm toying with the idea of either:
> > 1. having programs explicitly tell the kernel they are interactive
> > 2. adding a scheduler hook to /dev/dsp -- the observation is that if a
> > program is producing sound it probably should get some cpu time in a
> > timely manner. this would cover audio/video players and web browsers,
> > but would not cover other programs (say a pdf reader). it may be it is
> > good enough though
> >
> > --
> > Mateusz Guzik <mjguzik gmail.com>
> >