Re: Periodic rant about SCHED_ULE
- In reply to: Mark Millard : "Re: Periodic rant about SCHED_ULE"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 25 Mar 2023 18:23:04 UTC
On Mar 25, 2023, at 11:14, Mark Millard <marklmi@yahoo.com> wrote: > Peter <pmc_at_citylink.dinoex.sub.org> wrote on > Date: Sat, 25 Mar 2023 15:47:42 UTC : > >> Quoting George Mitchell <george+freebsd@m5p.com>: >> >>>> https://forums.freebsd.org/threads/what-is-sysctl-kern-sched-preempt_thresh.85 >>>> >>> Thank you! -- George >> >> You're welcome. Can I get a success/failure report? >> >> >> --------------------------------------------------------------------- >>>> On 3/22/23, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote: >>>>> >>>>> I reported the issue with ULE some 15 to 20 years ago. >> >> Can I get the PR number, please? >> >> >> --------------------------------------------------------------------- >> Test usecase: >> ============= >> >> Create two compute tasks competing for the same -otherwise unused- core, >> one without, one with syscalls: >> >> # cpuset -l 13 sh -c "while true; do :; done" & >> # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null >> >> Within a few seconds the two task are balanced, running at nearly the >> same PRI and using each 50% of the core: >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 5166 root 1 88 0 13M 3264K RUN 13 9:23 51.65% sh >> 10675 root 1 87 0 13M 3740K CPU13 13 1:30 48.57% gzip >> >> This changes when the tar reaches /usr/include with it's many small >> files. Now smaller blocks are delivered to gzip, it does more >> syscalls, and things get ugly: >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 5166 root 1 94 0 13M 3264K RUN 13 18:07 95.10% sh >> 19028 root 1 81 0 13M 3740K CPU13 13 1:23 4.87% gzip > > Why did PID 10675 change to 19028? > >> This does not happen because tar would be slow in moving data to >> gzip: tar reads from SSD, or more likely from ARC, and this is >> always faster than gzip-9. The imbalance is made by the scheduler. > > > When I tried that tar line, I get lots of output to stderr: > > # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null > tar: Removing leading '/' from member names > a . > a root > a wrkdirs > a bin > a usr > . . . > > Was that an intentional part of the test? > > To avoid this I used: > > # tar cvf - / 2>/dev/null | cpuset -l 13 gzip -9 2>&1 > /dev/null > > At which point I get the likes of: > > 17129 root 1 68 0 14192Ki 3628Ki RUN 13 0:20 3.95% gzip -9 > 17128 root 1 20 0 58300Ki 13880Ki pipdwt 18 0:00 0.27% tar cvf - / (bsdtar) > 17097 root 1 133 0 13364Ki 3060Ki CPU13 13 8:05 95.93% sh -c while true; do :; done > > up front. > > For reference, I also see the likes of the following from > "gstat -spod" (it is a root on ZFS context with PCIe Optane media): > > dT: 1.063s w: 1.000s > L(q) ops/s r/s kB kBps ms/r w/s kB kBps ms/w d/s kB kBps ms/d o/s ms/o %busy Name > . . . > 0 68 68 14 937 0.0 0 0 0 0.0 0 0 0 0.0 0 0.0 0.1| nvd2 > . . . > > I left it running and I'm now seeing: 17129 root 1 107 0 14192Ki 3628Ki CPU13 13 3:01 48.10% gzip -9 17128 root 1 21 0 58300Ki 15428Ki pipdwt 20 0:04 2.02% tar cvf - / (bsdtar) 17097 root 1 115 0 13364Ki 3060Ki RUN 13 16:30 51.77% sh -c while true; do :; done Also examples of the likes of: dT: 1.063s w: 1.000s L(q) ops/s r/s kB kBps ms/r w/s kB kBps ms/w d/s kB kBps ms/d o/s ms/o %busy Name . . . 0 1213 1213 5 6456 0.0 0 0 0 0.0 0 0 0 0.0 0 0.0 1.2| nvd2 . . . FYI: ThreadRipper 1950X context. Looks like what I'll see is very dependent on when I look at what it is doing: the details involved matter. === Mark Millard marklmi at yahoo.com