Re: Periodic rant about SCHED_ULE

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Sun, 26 Mar 2023 01:18:20 UTC
On Sat, Mar 25, 2023 at 03:35:36PM -0700, Mark Millard wrote:
! > On Mar 25, 2023, at 14:51, Peter <pmc@citylink.dinoex.sub.org> wrote:
! > 
! > On Sat, Mar 25, 2023 at 01:41:16PM -0700, Mark Millard wrote:
! > ! On Mar 25, 2023, at 11:58, Peter <pmc@citylink.dinoex.sub.org> wrote:
! > 
! > ! > ! 
! > ! > ! At which point I get the likes of:
! > ! > ! 
! > ! > ! 17129 root          1  68    0  14192Ki    3628Ki RUN     13   0:20   3.95% gzip -9
! > ! > ! 17128 root          1  20    0  58300Ki   13880Ki pipdwt  18   0:00   0.27% tar cvf - / (bsdtar)
! > ! > ! 17097 root          1 133    0  13364Ki    3060Ki CPU13   13   8:05  95.93% sh -c while true; do :; done
! > ! > ! 
! > ! > ! up front.
! > ! > 
! > ! > Ah. So? To me this doesn't look good. If both jobs are runnable, they
! > ! > should each get ~50%.
! > ! > 
! > ! > ! For reference, I also see the likes of the following from
! > ! > ! "gstat -spod" (it is a root on ZFS context with PCIe Optane media):
! > ! > 
! > ! > So we might assume that indeed both jobs are runable, and the only
! > ! > significant difference is that one does system calls while the other
! > ! > doesn't.
! > ! > 
! > ! > The point of this all is: identify the malfunction with the most
! > ! > simple usecase. (And for me here is a malfunction.)
! > ! > And then, obviousely, fix it.
! > ! 
! > ! I tried the following that still involves pipe-io but avoids
! > ! file system I/O (so: simplifying even more):
! > ! 
! > ! cat /dev/random | cpuset -l 13 gzip -9 >/dev/null 2>&1
! > ! 
! > ! mixed with:
! > ! 
! > ! cpuset -l 13 sh -c "while true; do :; done" &
! > ! 
! > ! So far what I've observed is just the likes of:
! > ! 
! > ! 17736 root          1 112    0  13364Ki    3048Ki RUN     13   2:03  53.15% sh -c while true; do :; done
! > ! 17735 root          1 111    0  14192Ki    3676Ki CPU13   13   2:20  46.84% gzip -9
! > ! 17734 root          1  23    0  12704Ki    2364Ki pipewr  24   0:14   4.81% cat /dev/random
! > ! 
! > ! Simplifying this much seems to get a different result.
! > 
! > Okay, then you have simplified too much and the malfunction is not
! > visible anymore.
! > 
! > ! Pipe I/O of itself does not appear to lead to the
! > ! behavior you are worried about.
! > 
! > How many bytes does /dev/random deliver in a single read() ?
! > 
! > ! Trying cat /dev/zero instead ends up similar:
! > ! 
! > ! 17778 root          1 111    0  14192Ki    3672Ki CPU13   13   0:20  51.11% gzip -9
! > ! 17777 root          1  24    0  12704Ki    2364Ki pipewr  30   0:02   5.77% cat /dev/zero
! > ! 17736 root          1 112    0  13364Ki    3048Ki RUN     13   6:36  48.89% sh -c while true; do :; done
! > ! 
! > ! It seems that, compared to using tar and a file system, there
! > ! is some significant difference in context that leads to the
! > ! behavioral difference. It would probably be of interest to know
! > ! what the distinction(s) are in order to have a clue how to
! > ! interpret the results.
! > 
! > I can tell you:
! > With tar, tar can likely not output data from more than one input
! > file in a single output write(). So, when reading big files, we
! > get probably 16k or more per system call over the pipe. But if the
! > files are significantly smaller than that (e.g. in /usr/include),
! > then we get gzip doing more system calls per time unit. And that
! > makes a difference, because a system call goes into the scheduler
! > and reschedules the thread.
! > 
! > This 95% vs. 5% imbalance is the actual problem that has to be
! > addressed, because this is not suitable for me, I cannot wait for my
! > tasks starving along at a tenth of the expected compute only because
! > some number crunching does also run on the core.
! > 
! > Now, reading from /dev/random cannot reproduce it. Reading from
! > tar can reproduce it under certain conditions - and that is all that
! > is needed.
! 
! The suggestion that the size of the transfers into the
! first pipe matters is back up by experiments with the
! likes of:
! 
! dd if=/dev/zero bs=128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! vs.
! dd if=/dev/zero bs=132 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! vs.
! dd if=/dev/zero bs=133 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! vs.
! dd if=/dev/zero bs=192 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! vs.
! dd if=/dev/zero bs=1k | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! vs.
! dd if=/dev/zero bs=4k | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! vs.
! dd if=/dev/zero bs=16k | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! 
! (just examples) as what is paired up with:
! 
! cpuset -l 13 sh -c "while true; do :; done" &
! 
! Such avoids the uncontrolled variability in use of tar
! against a file system.
! 
! But an interesting comparison/contrast results from, for
! example:
! 
! dd if=/dev/zero bs=128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! vs.
! dd if=/dev/random bs=128 | cpuset -l 13 gzip -9 >/dev/null 2>&1 &
! 
! as what is paired with the: cpuset -l 13 sh -c "while true; do :; done" &
! 
! At least in my context, the /dev/zero one ends up with:
! 
! 18251 root          1  68    0  14192Ki    3676Ki RUN     13   0:02   1.07% gzip -9
! 18250 root          1  20    0  12820Ki    2484Ki pipewr  29   0:02   1.00% dd if=/dev/zero bs=128
! 18177 root          1 135    0  13364Ki    3048Ki CPU13   13  14:47  98.93% sh -c while true; do :; done
! 
! but the /dev/random one ends up with:
! 
! 18253 root          1 108    0  14192Ki    3676Ki CPU13   13   0:09  50.74% gzip -9
! 18252 root          1  36    0  12820Ki    2488Ki pipewr  30   0:03  16.96% dd if=/dev/random bs=128
! 18177 root          1 115    0  13364Ki    3048Ki RUN     13  15:45  49.26% sh -c while true; do :; done
! 
! It appears that the CPU time (or more) for the dd feeding the
! first pipe matters for the overall result, not just the
! bs= value used.

This is all fine, but it is of no relevance to me. I'm not an
entomologist; I don't want to explore bugs, I just want to kill them.