Re: Periodic rant about SCHED_ULE

Reply: Kevin Bowling : "Re: Periodic rant about SCHED_ULE"
Reply: Mark Johnston : "Re: Periodic rant about SCHED_ULE"
Reply: Alexander Leidinger : "Re: Periodic rant about SCHED_ULE"
In reply to: Mateusz Guzik : "Re: Periodic rant about SCHED_ULE"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Thu, 30 Mar 2023 15:36:54 UTC
I looked into it a little more, below you can find summary and steps forward.

First a general statement: while ULE does have performance bugs, it
has better basis than 4BSD to make scheduling decisions. Most notably
it understands CPU topology, at least for cases which don't involve
big.LITTLE. For any non-freak case where 4BSD performs better, it is a
bug in ULE if this is for any reason other than a tradeoff which can
be tweaked to line them up. Or more to the point, there should not be
any legitimate reason to use 4BSD these days and modulo the bugs
below, you are probably losing on performance for doing so.

Bugs reported in this thread by others and confirmed by me:
1. failure to load-balance when having n CPUs and n + 1 workers -- the
excess one stays on one the same CPU thread continuously penalizing
the same victim. as a result total real time to execute a finite
computation is longer than in the case of 4BSD
2. unfairness of nice -n 20 threads vs threads going frequently off
CPU (e.g., due to I/O) -- after using only a fraction of the slice the
victim has to wait for the cpu hog to use up its entire slice, rinse
and repeat. This extends a 7+ minute buildkernel to over 67 minutes,
not an issue on 4BSD

I did not put almost any effort into investigating no 1. There is code
which is supposed to rebalance load across CPUs, someone(tm) will have
to sit through it -- for all I know the fix is trivial.

Fixing number 2 makes *another* bug more acute and it complicates the
whole ordeal.

Thus, bug reported by me:
3. interactivity scoring is bogus -- originally introduced to detect
"interactive" behavior by equating being off CPU with waiting for user
input. One part of the problem is that it puts *all* non-preempted off
CPU time into one bag: a voluntary sleep. This includes suffering from
lock contention in the kernel, lock contention in the program itself,
file I/O and so on, none of which has bearing on how interactive or
not the program might happen to be. A bigger part of the problem is
that at least today, the graphical programs don't even act this way to
begin with -- they stay on CPU *a lot*.

I asked people to provide me with the output of: dtrace -n
'sched:::on-cpu { @[execname] = lquantize(curthread->td_priority, 0,
224, 1); }' from their laptops/desktops.

One finding is that most people (at least those who reported) use firefox.

Another finding is that the browser is above the threshold which would
be considered "interactive" for vast majority of the time in all
reported cases.

I booted a 2 thread vm with xfce and decided to click around. Spawned
firefox, opened a file manager (Thunar) and from there I opened a
movie to play with mpv. As root I spawned make -j 2 buildkernel. it
was not particularly good :)

I found that mpv spawns a bunch of threads, most notably 2 distinct
threads for audio and video output. The one for video got a priority
of 175, while the rest had either 88 or 89 -- the lowest for
timesharing not considered interactive [note lower is considered
better].

At the same time the file manager who was left in the background kept
doing evil syscall usage, which as a result bouncing between a regular
timesharing priority and one which made it "interactive", even though
the program was not touched for minutes.

Or to put it differently, the scheduler failed to recognize that mpv
is the program to prioritize, all while thinking the background time
waster is the thing to look after (so to speak).

This brings us to fixing problem 2: currently, due to the existence of
said problem, the interactivity scoring woes are less acute -- the
venerable make -j example is struggling to get CPU time, as a result
messing with real interactive programs to a lesser extent. If that
gets fixed, we are in a different boat altogether.

I don't see a clean solution.

Right now I'm toying with the idea of either:
1. having programs explicitly tell the kernel they are interactive
2. adding a scheduler hook to /dev/dsp -- the observation is that if a
program is producing sound it probably should get some cpu time in a
timely manner. this would cover audio/video players and web browsers,
but would not cover other programs (say a pdf reader). it may be it is
good enough though

-- 
Mateusz Guzik <mjguzik gmail.com>