Re: Periodic rant about SCHED_ULE
- Reply: George Mitchell : "Re: Periodic rant about SCHED_ULE"
- Reply: Ian Lepore : "Re: Periodic rant about SCHED_ULE"
- In reply to: Rozhuk Ivan : "Re: Periodic rant about SCHED_ULE"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 13 Jul 2021 22:09:27 UTC
I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257160 regarding the following: SCHED_4BSD seems subject to a bit of rot at this point. To Wit, my 4 core riscv64 platform recently showed this top while doing a make -j4 of my own code. Note that each of the processes using more than 1000% CPU are single-threaded. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 604 dgilbert 1 45 0 109M 66M CPU3 3 0:02 1039.89% c++ 605 dgilbert 1 45 0 109M 66M CPU1 1 0:02 1031.29% c++ 606 dgilbert 1 45 0 109M 66M RUN 2 0:02 1020.32% c++ 603 dgilbert 1 44 0 109M 66M CPU0 0 0:02 1011.41% c++ 854 root 1 40 0 17M 4764K select 1 3:04 0.17% tmux 425 root 1 40 0 14M 4040K CPU2 2 0:03 0.15% top As I said there, I don't believe that this is RISCV64 related --- it seems to me that the data that top is pulling is either incorrect or top is interpreting it incorrectly. The WCPU value seems to asymptotically approach 100%, but I'm not sure of that --- I can only watch it for so long. The same behaviour is seen if you launch (while true; do true; done) & in the background. But OTOH, if you are running SCHED_ULE, and you launch two of those while true's at nice -20 for each cpu ... then launch one at nice '0' ... you'll find that the nice 0 process fails to get 100% cpu. To my mind, this is a failure of the scheduler to read my intentions of nice -20. In fact, at times, the processor share of the un-nice process will fall below some of the nice processes for a few dozen samples at a time. Here is a top displaying that brokenness... PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 36410 root 1 89 0 14M 796K RUN 3 0:18 54.31% bash 36370 root 1 106 20 14M 800K RUN 1 0:58 49.86% bash 36372 root 1 105 20 14M 800K CPU1 1 0:56 49.69% bash 36375 root 1 106 20 14M 800K RUN 0 0:57 46.37% bash 36373 root 1 103 20 14M 800K RUN 3 0:56 44.94% bash 36371 root 1 105 20 14M 800K CPU0 0 0:57 43.51% bash 36376 root 1 105 20 14M 800K RUN 2 0:59 38.76% bash 36369 root 1 104 20 14M 920K CPU2 2 0:57 37.61% bash 36374 root 1 104 20 14M 800K RUN 2 0:57 32.66% bash TBH, I think SCHED_ULE is a failure and the only reason more people don't think so is that processors are now laregely too fast for people to care. Most people don't notice the scheduler because they almost never have more tasks than processor threads, so even really dumb schedulers would work out "OK" 98% of the time. I know we don't have guiding principles for nice, but I would toss out the +/- five rule for it --- that any process more than 5 nice levels lower from a cpu-busy process shouldn't preempt the higher process. I realize we have rtprio, but it's a pain to use. Anyways, don't let this last comment distract. On Thu, Jul 8, 2021 at 3:20 AM Rozhuk Ivan <rozhuk.im@gmail.com> wrote: > On Wed, 7 Jul 2021 13:47:47 -0400 > George Mitchell <george+freebsd@m5p.com> wrote: > > > CPU: AMD Ryzen 5 2600X Six-Core Processor (3600.10-MHz K8-class CPU) > > (12 threads). > > > > FreeBSD 12.2-RELEASE-p7 r369865 GENERIC amd64 (SCHED_ULE) vs > > FreeBSD 12.2-RELEASE-p7 r369865 M5P amd64 (SCHED_4BSD). > > > > Comparing "make buildworld" time with misc/dnetc running vs not > > running. (misc/dnetc is your basic 100% compute-bound task, running > > at nice 20.) > > > > Three out of the four combinations build in roughly four hours, but > > SCHED_ULE with dnetc running takes close to twelve! (And that was > > overnight with basically nothing else running.) This is an even > > worse disparity than I have seen in previous releases. > > I do not use dnetc, but shed_ule on 2700 compile wold faster than 4 hours. > With ccache it takes ~10 minutes: world+kernel build and install and > update loaders. > > > # Make an SMP-capable kernel by default > options SMP #b Symmetric MultiProcessor Kernel > options NUMA #o Non-Uniform Memory Architecture > support > options EARLY_AP_STARTUP #o > > device cpufreq #m for non-ACPI CPU frequency > control > device cpuctl #m Provides access to MSRs, CPUID > info and microcode update feature. > > > # Kernel base > options SCHED_ULE #b 4BSD/ULE scheduler > options _KPOSIX_PRIORITY_SCHEDULING #b POSIX P1003_1B real-time > extensions > options PREEMPTION #b Enable kernel thread preemption > > > and sysctl tunings on desktop only: > # SCHEDULER > kern.sched.steal_thresh=1 # Minimum load on remote CPU > before we'll steal // workaround for freezes > kern.sched.balance=0 # Enables the long-term load > balancer > kern.sched.balance_interval=1000 # Average period in stathz ticks > to run the long-term balancer > kern.sched.affinity=10000 # Number of hz ticks to keep > thread affinity for > > > >