Re: Periodic rant about SCHED_ULE

From: Matthias Andree <mandree_at_freebsd.org>
Date: Mon, 27 Mar 2023 13:28:24 UTC
Am 27.03.23 um 14:13 schrieb Mateusz Guzik:

> 
> I repeat the setup: 8 cores, 8 processes doing cpu-bound stuff while
> niced to 20 vs make -j buildkernel
> 
> I had a little more look here, slapped in some hacks as a POC and got
> an improvement from 67 minutes above to 21.
> 
> Hacks are:
> 1. limit hog timeslice to 1 tick so that is more eager to bail
> 2. always preempt if pri < cpri
> 
> So far I can confidently state the general problem: ULE penalizes
> non-cpu hogs for blocking (even if it is not their fault, so to speak)
> and that bumps their prio past preemption threshold, at which point
> they can't preempt said hogs (despite hogs having a higher priority).
> At the same time hogs use their full time slices, while non-hogs get
> off cpu very early and have to wait a long time to get back on, at
> least in part due to inability to preempt said hogs.
> 
> As I mentioned elsewhere in the thread, interactivity scoring takes
> "voluntary off cpu time" into account. As literally anything but
> getting preempted counts as "voluntary sleep", workers get shafted for
> going off cpu while waiting on locks in the kernel.
> 
> If I/O needs to happen and the thread waits for the result, it most
> likely does it early in its timeslice and once it's all ready it waits
> for background hogs to get off cpu -- it can't preempt them.
> 
> All that said:
> 1. "interactivity scoring" (see sched_interact_score)
> 
> I don't know if it makes any sense to begin with. Even if it does, it
> counts stuff it should not by not differentiating between deliberately
> going off cpu (e.g., actual sleep) vs just waiting for a file being
> read. Imagine firefox reading a file from disk and being considered
> less interactive for it.
> 
> I don't have a solution for this problem. I *suspect* the way to go
> would be to explicitly mark xorg/wayland/whatever as "interactive" and
> have it inherited by its offspring. At the same time it should not
> follow to stuff spawned in terminals. Not claiming this is perfect,
> but it does eliminate the guessing game.
> 
> Even so, 4BSD does not have any mechanism of the sort and reportedly
> remains usable on a desktop just by providing some degree of fairness.
> 
> Given that, I suspect the short term solution would whack said scoring
> altogether and focus on fairness (see below).
> 
> 2. fairness
> 
> As explained above doing any offcpu-time inducing work instantly
> shafts threads versus cpu hogs, even if said hogs are niced way above
> them.
> 
> Here I *suspect* position to add in the runqueue should be related to
> how much slice was left when the thread went off cpu, while making
> sure that hogs get to run eventually. Not that I have a nice way of
> implementing this -- maybe a separate queue for known hogs and picking
> them every n turns or similar.
> 

There is some analogy - not sure if we can learn something from it.  TCP 
(the network's Transmission Control Protocol) has the big overspanning 
item "fairness between streams", and possibly we need to put this into 
focus.

Apparently we have now established that threads that do not use up their 
quantum are not scheduled *sooner* which seems to harm both 
interactivity AND total processing time (because, say, they may want to 
run a spell checker after receiving a key-press key-release sequence or 
send off an e-mail after a click on the send button).

What I currently fail to understand is why we have not gone to improving 
SCHED_ULE yet.  This seems to be the first time I've paid attention to 
some details.  Have I missed similar discussion in the past?  Or have 
they stalled soonish?

-- 
Matthias Andree
FreeBSD ports committer