ULE steal_idle questions
Don Lewis
truckman at FreeBSD.org
Sat Aug 26 18:29:39 UTC 2017
On 26 Aug, Rodney W. Grimes wrote:
>> On Fri, 25 Aug 2017, Don Lewis wrote:
>>
>> > ...
>> > Something else that I did not expect is the how frequently threads are
>> > stolen from the other SMT thread on the same core, even though I
>> > increased steal_thresh from 2 to 3 to account for the off-by-one
>> > problem. This is true even right after the system has booted and no
>> > significant load has been applied. My best guess is that because of
>> > affinity, both the parent and child processes run on the same CPU after
>> > fork(), and if a number of processes are forked() in quick succession,
>> > the run queue of that CPU can get really long. Forcing a thread
>> > migration in exec() might be a good solution.
>>
>> Since you are trying a lot of combinations, maybe you can tell us which
>> ones work best. SCHED_4BSD works better for me on an old 2-core system.
>> SCHED_ULE works better on a not-so old 4x2 core (Haswell) system, but I
>> don't like it due to its complexity. It makes differences of at most
>> +-2% except when mistuned it can give -5% for real time (but better for
>> CPU and presumably power).
>>
>> For SCHED_4BSD, I wrote fancy tuning for fork/exec and sometimes get
>> everything to like up for a 3% improvement (803 seconds instead of 823
>> on the old system, with -current much slower at 840+ and old versions
>> of ULE before steal_idle taking 890+). This is very resource (mainly
>> cache associativity?) dependent and my tuning makes little difference
>> on the newer system. SCHED_ULE still has bugfeatures which tend to
>> help large builds by reducing context switching, e.g., by bogusly
>> clamping all CPU-bound threads to nearly maximal priority.
>
> That last bugfeature is probably what makes current systems
> interactive performance tank rather badly when under heavy
> loads. Would it be hard to fix?
I actually haven't noticed that problem on my package build boxes. I've
experienced decent interactive performance even when the load average is
in the 60 to 80 range. I also have poudriere configured to use tmpfs
and the only issue I run into is when it starts getting heavily into
swap (like 20G) and I leave my session idle for a while, which lets my
shell and sshd get swapped out. Then it takes them a while to wake up
again. Once they are paged in, then things feel snappy again. This is
remote access, so I can't comment on what X11 feels like.
More information about the freebsd-arch
mailing list