Re: Periodic rant about SCHED_ULE

From: Dewayne Geraghty <dewayne.geraghty_at_heuristicsystems.com.au>
Date: Thu, 15 Jul 2021 01:03:04 UTC
On 15/07/2021 1:47 am, RW via freebsd-hackers wrote:

> On Thu, 8 Jul 2021 10:19:07 +0300
> Rozhuk Ivan wrote:
>
>
>> and sysctl tunings on desktop only:
>>
>> # SCHEDULER
>> kern.sched.steal_thresh=1
>> kern.sched.balance=0			 
>> kern.sched.balance_interval=1000
>> kern.sched.affinity=10000	
> You missed out
>
> kern.sched.preempt_thresh=224
>
> (perhaps because it's so well known).
>
> In my experience this makes a big difference for desktop use. If I set
> that and build on tmpfs, to minimise the effect of I/O contention, I
> don't see any discernible effect on Xfce when building world with -j4.
>
> This is on a bottom of the range i5 from 9 years ago. It's not
> particularly fast. 
>
> I think the default only allows preemption by real-time and kernel
> threads. 
>
Hi RW,  Note the PRI(ority) column when you perform /usr/bin/top. 
Processes with a PRI below the default kern.sched.preempt_thresh=80 (ie
nice -n 8) may pre-empt other processes or send interprocessor
interrupts to others (CPUs). An
idprio 0 top
is assigned a starting PRI of 124; so on SCHED_ULE, these processes will
receive cpu time (even at idprio 31) but won't pre-empt others.

If you really want all processes to pre-empt others, enabling
FULL_PREEMPTION achieves the same goal as 224.  I don't have a use case
for no pre-emption. Anyone?

Why kern.sched.preempt_thresh=224 helps desktop users, I can only
speculate that with a high threshold, more IPI's are sent to other CPU
cores so they can be busy (?).  Refer to /usr/src/sys/kern/sched_ule.c  
--
Returning to the topic.  Its a very hard choice between schedulers.  I
did a lot of testing between them and tuning to see if one excelled on
my humble Xeon-E3.  I couldn't see a significant difference between
workloads - though next time (and a hint for others) I'll disable SMT
and set dev.cpu.0.freq to disable turbo behaviour.  For now, sched_4bsd
appears to be more efficient in terms of code complexity and people with
high CPU workloads have preferred sched_4bsd in the past, while
sched_ule has a lot of things to tweak and is recommended by the FreeBSD
project. Otherwise it wouldn't be the default  

Looking at
https://github.com/freebsd/freebsd-src/tree/main/sys/kern/sched_*.c 
their histories are tweaked a couple of times a year, so I wouldn't rule
sched_4bsd out of contention just yet. 

FWIW, my servers modify only:
kern.sched.affinity=7
kern.sched.interact=0
kern.sched.slice=128
while firewalls:
kern.sched.balance=0
kern.sched.interact=0

A loadable schedule has been discussed here a few times - I vaguely
recall it being inefficient (complexity) and unnecessary (you'll
determine one scheduler and unless testing, unlikely to change). 
Further in the past, sched_4bsd was to be removed, but some demonstrated
it had better performance for their workload.
Cheerio.