Re: Periodic rant about SCHED_ULE

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Sat, 25 Mar 2023 21:51:49 UTC
On Sat, Mar 25, 2023 at 01:41:16PM -0700, Mark Millard wrote:
! On Mar 25, 2023, at 11:58, Peter <pmc@citylink.dinoex.sub.org> wrote:

! > ! 
! > ! At which point I get the likes of:
! > ! 
! > ! 17129 root          1  68    0  14192Ki    3628Ki RUN     13   0:20   3.95% gzip -9
! > ! 17128 root          1  20    0  58300Ki   13880Ki pipdwt  18   0:00   0.27% tar cvf - / (bsdtar)
! > ! 17097 root          1 133    0  13364Ki    3060Ki CPU13   13   8:05  95.93% sh -c while true; do :; done
! > ! 
! > ! up front.
! > 
! > Ah. So? To me this doesn't look good. If both jobs are runnable, they
! > should each get ~50%.
! > 
! > ! For reference, I also see the likes of the following from
! > ! "gstat -spod" (it is a root on ZFS context with PCIe Optane media):
! > 
! > So we might assume that indeed both jobs are runable, and the only
! > significant difference is that one does system calls while the other
! > doesn't.
! > 
! > The point of this all is: identify the malfunction with the most
! > simple usecase. (And for me here is a malfunction.)
! > And then, obviousely, fix it.
! 
! I tried the following that still involves pipe-io but avoids
! file system I/O (so: simplifying even more):
! 
! cat /dev/random | cpuset -l 13 gzip -9 >/dev/null 2>&1
! 
! mixed with:
! 
! cpuset -l 13 sh -c "while true; do :; done" &
! 
! So far what I've observed is just the likes of:
! 
! 17736 root          1 112    0  13364Ki    3048Ki RUN     13   2:03  53.15% sh -c while true; do :; done
! 17735 root          1 111    0  14192Ki    3676Ki CPU13   13   2:20  46.84% gzip -9
! 17734 root          1  23    0  12704Ki    2364Ki pipewr  24   0:14   4.81% cat /dev/random
! 
! Simplifying this much seems to get a different result.

Okay, then you have simplified too much and the malfunction is not
visible anymore.

! Pipe I/O of itself does not appear to lead to the
! behavior you are worried about.

How many bytes does /dev/random deliver in a single read() ?
 
! Trying cat /dev/zero instead ends up similar:
! 
! 17778 root          1 111    0  14192Ki    3672Ki CPU13   13   0:20  51.11% gzip -9
! 17777 root          1  24    0  12704Ki    2364Ki pipewr  30   0:02   5.77% cat /dev/zero
! 17736 root          1 112    0  13364Ki    3048Ki RUN     13   6:36  48.89% sh -c while true; do :; done
! 
! It seems that, compared to using tar and a file system, there
! is some significant difference in context that leads to the
! behavioral difference. It would probably be of interest to know
! what the distinction(s) are in order to have a clue how to
! interpret the results.

I can tell you:
With tar, tar can likely not output data from more than one input
file in a single output write(). So, when reading big files, we
get probably 16k or more per system call over the pipe. But if the
files are significantly smaller than that (e.g. in /usr/include),
then we get gzip doing more system calls per time unit. And that
makes a difference, because a system call goes into the scheduler
and reschedules the thread.

This 95% vs. 5% imbalance is the actual problem that has to be
addressed, because this is not suitable for me, I cannot wait for my
tasks starving along at a tenth of the expected compute only because
some number crunching does also run on the core.

Now, reading from /dev/random cannot reproduce it. Reading from
tar can reproduce it under certain conditions - and that is all that
is needed.