Re: Periodic rant about SCHED_ULE
- Reply: Matthias Apitz : "Re: Periodic rant about SCHED_ULE"
- In reply to: Mateusz Guzik : "Re: Periodic rant about SCHED_ULE"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 30 Mar 2023 19:05:01 UTC
On Thu, Mar 30, 2023 at 11:50 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > > On 3/30/23, Kevin Bowling <kevin.bowling@kev009.com> wrote: > > On Thu, Mar 30, 2023 at 11:29 AM Kevin Bowling <kevin.bowling@kev009.com> > > wrote: > >> > >> On Thu, Mar 30, 2023 at 8:37 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > >> > > >> > I looked into it a little more, below you can find summary and steps > >> > forward. > >> > > >> > First a general statement: while ULE does have performance bugs, it > >> > has better basis than 4BSD to make scheduling decisions. Most notably > >> > it understands CPU topology, at least for cases which don't involve > >> > big.LITTLE. For any non-freak case where 4BSD performs better, it is a > >> > bug in ULE if this is for any reason other than a tradeoff which can > >> > be tweaked to line them up. Or more to the point, there should not be > >> > any legitimate reason to use 4BSD these days and modulo the bugs > >> > below, you are probably losing on performance for doing so. > >> > >> An elided simple algorithm for big.LITTLE, from Larry McVoy.. if you > >> run for an entire quantum, flag preference for big core. If you run > >> for less or get punted off, flag for little core preference. > >> > >> > Bugs reported in this thread by others and confirmed by me: > >> > 1. failure to load-balance when having n CPUs and n + 1 workers -- the > >> > excess one stays on one the same CPU thread continuously penalizing > >> > the same victim. as a result total real time to execute a finite > >> > computation is longer than in the case of 4BSD > >> > 2. unfairness of nice -n 20 threads vs threads going frequently off > >> > CPU (e.g., due to I/O) -- after using only a fraction of the slice the > >> > victim has to wait for the cpu hog to use up its entire slice, rinse > >> > and repeat. This extends a 7+ minute buildkernel to over 67 minutes, > >> > not an issue on 4BSD > >> > > >> > I did not put almost any effort into investigating no 1. There is code > >> > which is supposed to rebalance load across CPUs, someone(tm) will have > >> > to sit through it -- for all I know the fix is trivial. > >> > > >> > Fixing number 2 makes *another* bug more acute and it complicates the > >> > whole ordeal. > >> > > >> > Thus, bug reported by me: > >> > 3. interactivity scoring is bogus -- originally introduced to detect > >> > "interactive" behavior by equating being off CPU with waiting for user > >> > input. One part of the problem is that it puts *all* non-preempted off > >> > CPU time into one bag: a voluntary sleep. This includes suffering from > >> > lock contention in the kernel, lock contention in the program itself, > >> > file I/O and so on, none of which has bearing on how interactive or > >> > not the program might happen to be. A bigger part of the problem is > >> > that at least today, the graphical programs don't even act this way to > >> > begin with -- they stay on CPU *a lot*. > >> > > >> > I asked people to provide me with the output of: dtrace -n > >> > 'sched:::on-cpu { @[execname] = lquantize(curthread->td_priority, 0, > >> > 224, 1); }' from their laptops/desktops. > >> > > >> > One finding is that most people (at least those who reported) use > >> > firefox. > >> > > >> > Another finding is that the browser is above the threshold which would > >> > be considered "interactive" for vast majority of the time in all > >> > reported cases. > >> > > >> > I booted a 2 thread vm with xfce and decided to click around. Spawned > >> > firefox, opened a file manager (Thunar) and from there I opened a > >> > movie to play with mpv. As root I spawned make -j 2 buildkernel. it > >> > was not particularly good :) > >> > > >> > I found that mpv spawns a bunch of threads, most notably 2 distinct > >> > threads for audio and video output. The one for video got a priority > >> > of 175, while the rest had either 88 or 89 -- the lowest for > >> > timesharing not considered interactive [note lower is considered > >> > better]. > >> > > >> > At the same time the file manager who was left in the background kept > >> > doing evil syscall usage, which as a result bouncing between a regular > >> > timesharing priority and one which made it "interactive", even though > >> > the program was not touched for minutes. > >> > > >> > Or to put it differently, the scheduler failed to recognize that mpv > >> > is the program to prioritize, all while thinking the background time > >> > waster is the thing to look after (so to speak). > >> > > >> > This brings us to fixing problem 2: currently, due to the existence of > >> > said problem, the interactivity scoring woes are less acute -- the > >> > venerable make -j example is struggling to get CPU time, as a result > >> > messing with real interactive programs to a lesser extent. If that > >> > gets fixed, we are in a different boat altogether. > >> > > >> > I don't see a clean solution. > > > > One other random anecdote. Windows 11 uses window focus to highly > > boost scheduling priority in an obviously effective way. I have no > > idea how difficult something like that would be to fit into the unix > > world. > > > > I thought about doing something like that, but I consider it dodgy. > Imagine you play some crap from youtube while messing around in a text > editor -- I'm pretty sure the former is more prone to disturbance from > scheduling changes. > > Anyhow after sending the above e-mail an actual solution hit me: the X > server can tell the kernel what processes connect to it over the unix > socket, which again very well may be good enough. > > In the reports I got I found pulseaudio, this one may need to be > patched in a similar manner. Yeah that seems like an easier problem, IMO something like a userspace audio server (or its init script) should be in charge of setting it to RT. > -- > Mateusz Guzik <mjguzik gmail.com>