Re: Periodic rant about SCHED_ULE
- Reply: Mark Millard : "Re: Periodic rant about SCHED_ULE"
- In reply to: Mark Millard : "Re: Periodic rant about SCHED_ULE"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 23 Mar 2023 04:09:22 UTC
[I added a -j32 buildworld buildkernel with SCHED_4BSD and dnetc-in-use comparison, to the other ThreadRipper 1950X examples. SCHED_4BSD does take notably less time than SCHED_ULE when dnetc is also active: still a good match to the simple round-robin for this building activity. I will note that the 1950X UEFI/firmware is not configured present itself as NUMA but the FreeBSD kernels in use are NUMA capable as built.] On Mar 22, 2023, at 19:44, Mark Millard <marklmi@yahoo.com> wrote: > On Mar 22, 2023, at 18:08, Mark Millard <marklmi@yahoo.com> wrote: > >> On Mar 22, 2023, at 18:03, Mark Millard <marklmi@yahoo.com> wrote: >> >>> On Mar 22, 2023, at 16:17, Mark Millard <marklmi@yahoo.com> wrote: >>> >>>> On Mar 22, 2023, at 15:39, Mark Millard <marklmi@yahoo.com> wrote: >>>> >>>>> On Mar 22, 2023, at 13:34, Mark Millard <marklmi@yahoo.com> wrote: >>>>> >>>>>> On Mar 22, 2023, at 12:40, George Mitchell <george+freebsd@m5p.com> wrote: >>>>>> >>>>>>> On 3/22/23 15:21, Mark Millard wrote: >>>>>>>> George Mitchell <george+freebsd@m5p.com> wrote on >>>>>>>> Date: Wed, 22 Mar 2023 17:36:39 UTC : >>>>>>>> [...] >>>>>>>>> Here are the very complicated instructions for reproducing the problem: >>>>>>>>> 1. Install and start misc/dnetc from ports. >>>>>>>> Installing is likely easy, as likely would be building >>>>>>>> with default options (if any). I know nothing about >>>>>>>> starting misc/dnetc so that is research. (Possibly >>>>>>>> trivial, although if it has alternatives to control >>>>>>>> then I'd need to match that context too.) >>>>>>> >>>>>>> service dnetc start >>>>>> >>>>>> I built and installed misc/dnetc and got a binary >>>>>> blob that clearly was not built in my environment: >>>>>> >>>>>> # file /usr/local/distributed.net/dnetc >>>>>> /usr/local/distributed.net/dnetc: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), statically linked, for FreeBSD 10.1 (1001515), FreeBSD-style, stripped >>>>>> >>>>>> Way older FreeBSD vintage than the locally available toolchains >>>>>> would normally build. Some might be cautious about such a thing. >>>>>> >>>>>> The man page reported that: >>>>>> >>>>>> QUOTE >>>>>> If you have never run the client before, it will initiate the menu-driven >>>>>> configuration. Save and quit when done, the configuration file will be >>>>>> saved in the same directory as the client. Now, simply restart the >>>>>> client. From that point on it will use the saved configuration. >>>>>> END QUOTE >>>>>> >>>>>> I've not seen what the configuration asks about yet. >>>>> >>>>> I went through the configuration, basically just looking >>>>> at it, other than providing an E-mail address. Then . . . >>>>> >>>>> $ sudo service dnetc start >>>>> Password: >>>>> Cannot 'start' dnetc. Set dnetc_enable to YES in /etc/rc.conf or use 'onestart' instead of 'start'. >>>>> >>>>> $ sudo service dnetc onestart >>>>> >>>>> I just let it run without any extra competing activity, other >>>>> than I had my patched version of top running. It records and >>>>> reports various maximum-observed (MaxObs) figures, here >>>>> the load averages being relevant. >>>>> >>>>> Top showed that dnetc started 32 processes, one per hardware >>>>> thread. Mostly I saw: 100% nice and 0% idle. >>>>> >>>>> Letting it run and then looking at the load averages (and >>>>> their matching MaxObs figures) after something like 60+ min >>>>> (not carefully timed: was doing other things) showed: >>>>> >>>>> load averages: 31.97, 31.88, 31.66 MaxObs: 32.12, 31.97, 31.66 >>>>> >>>>> (Note: The machine had been up for over 2.75 days before >>>>> starting this and had not been building much of anything >>>>> during that time.) >>>>> >>>>> I've not yet experimented with having other, significant >>>>> competing activity. >>>>> >>>>>>>>> 2. Run "make buildworld". >>>>>>>> So on the 32 hardware-thread (16 cores) amd64 machine that >>>>>>>> I have access to, the test is to only have buildworld use >>>>>>>> about one hardware thread, no matter what else is going on. >>>>>>>> I never would have guessed that the steps would not involve >>>>>>>> more like -j$(sysctl -n hw.ncpu) (so around -j32 in this >>>>>>>> context). So it is good that you provided your note or >>>>>>>> I'd not know if I'd done similarly or not when trying such. >>>>>>>> [Note: -j1 and lack of -j are not strictly equivalent in >>>>>>>> how make operates. As I remember, the distinction makes >>>>>>>> a notable difference in the number of subprocesses created >>>>>>>> directly by make (one per action "line" vs. one for the >>>>>>>> whole block?). So even using -j1 might make a difference >>>>>>>> vs. what you specified. I'd have to test to see.] >>>>>>> >>>>>>> I am literally running "make buildworld" with no additional options. >>>>>> >>>>>> So required for repeating your results, but likely making >>>>>> such results not be interesting relative to how I normally >>>>>> deal with buildworld buildkernel and the likel, no matter >>>>>> if there is other activity in an overlapping time frame or >>>>>> not: my time preferences are too strong to wait for a single >>>>>> hardware thread to do my normal builds, even with no >>>>>> competing activity on the builder. >>>>>> >>>>>>>>> Standard out conveniently reports how long it took (wall clock). >>>>>>>> But nothing in your instructions indicate about how >>>>>>>> to get an idea much progress dnetc made during the >>>>>>>> various tests? [...] >>>>>>> >>>>>>> Honestly, I've never worried about this part. But dnetc logs its >>>>>>> progress in /usr/local/distributed.net/dnetc.txt, though not in terms >>>>>>> that are easy to relate to real-world progress. Oddly, when I run >>>>>>> "make buildworld," I'm primarily interested in getting the world built. >>>>>>> Perhaps others feel differently. >>>>>> >>>>>> Off topic for the specifics of the actual benchmark >>>>>> that you run: >>>>>> >>>>>> Then why not use of -jN ? In my context, any buildworld >>>>>> using -j1 or no -j at all takes a huge amount of time >>>>>> longer than letting it use all the hardware threads (or >>>>>> so). (I've avoided having any I/O bound contexts for >>>>>> such.) It does not take additional load on the system >>>>>> for that to be true --including on the 4-core small arm >>>>>> boards when I happen to buildworld on such (rare). >>>>>> >>>>>> >>>>>>>> [...] >>>>>>>> FYI: I've never built with and run the alternate >>>>>>>> scheduler so if there is any appropriate background >>>>>>>> for that that would not be obvious on finding basic >>>>>>>> instructions, it would be appropriate to provide >>>>>>>> such notes. >>>>>>>> [...] >>>>>>> >>>>>>> You have to build a new kernel, using a config file in which you have >>>>>>> replaced "options SCHED_ULE" with "options SCHED_4BSD". -- George >>>>>> >>>>>> Thanks for the notes. >>>>>> >>>>>> I've not decided if I'll do anything with the binary >>>>>> blob or not. >>>>> >>>> >>>> FYI: >>>> >>>> It is not your specific experiment, but I started my >>>> "extra load" experimenst with . . . >>>> >>>> I started a -j32 buildworld buildkernel with dnetc still >>>> running. I'm generally seeing around 55% Active and 42% >>> >>> Note "Active": user, sorry. >>> >>>> nice, < 2% system (it was building libllvm at this point). >>>> At that time: >>>> >>>> load averages: 64.41, 60.52, 49.81 MaxObs: 64.47, 60.52, 49.81 >>>> >>> >>> Contrasting results for some obj-lib32 build activity: >>> much more variety of User, nice, and system, including >>> times with < 5% user, 90+% nice. But not typical overall. >>> But lots of time roughly around 50%/50% or 35%/60%. There >>> were times with 15+% system. >>> >>> Somewhat after buildkernel started: >>> >>> load averages: 69.15, 64.12, 58.72 MaxObs: 75.98, 64.12, 58.72 >>> >>> Harder to summarize, so overall timing reports from the >>> buildworld and buildkernel stages. >>> >>> >>> buildworld: >>> >>> -------------------------------------------------------------- >>> ... World build completed on Wed Mar 22 16:37:57 PDT 2023 >>> ... World built in 2615 seconds, ncpu: 32, make -j32 >>> -------------------------------------------------------------- >>> >>> >>> buildkernel: >>> >>> -------------------------------------------------------------- >>> ... Kernel build for GENERIC-NODBG completed on Wed Mar 22 16:43:10 PDT 2023 >>> -------------------------------------------------------------- >>> ... Kernel(s) GENERIC-NODBG built in 311 seconds, ncpu: 32, make -j32 >>> -------------------------------------------------------------- >>> >>> Afterwards: >>> >>> load averages: 36.08, 53.14, 55.79 MaxObs: 75.98, 65.77, 59.84 >>> >>> >>> I then did (not all in the same window): >>> >>> $ sudo service dnetc onestop >>> # rm -fr /usr/obj/BUILDs/main-amd64-nodbg-clang-alt/usr/ >>> >>> before another -j32 buildworld buildkernel (no dnetc). The >>> reuslts for this were: >>> >>> >>> buildworld: >>> >>> -------------------------------------------------------------- >>> ... World build completed on Wed Mar 22 17:39:19 PDT 2023 >>> ... World built in 1240 seconds, ncpu: 32, make -j32 >>> -------------------------------------------------------------- >>> >>> (compared to the 2615 for dnetc also in use) >>> >>> >>> buildkernel: >>> >>> -------------------------------------------------------------- >>> ... Kernel build for GENERIC-NODBG completed on Wed Mar 22 17:41:17 PDT 2023 >>> -------------------------------------------------------------- >>> ... Kernel(s) GENERIC-NODBG built in 118 seconds, ncpu: 32, make -j32 >>> -------------------------------------------------------------- >>> >>> (compared to the 311 for dnetc also in use) >> >> I forgot to show the MaxObs load averages for the no-dnetc >> context: >> >> MaxObs: 39.77, 32.15, 25.75 >> >>> Experiments without -j32 will take a lot longer, even >>> without dnetc in use. I'm not sure there will be such >>> results today. >>> >> > > I decided to do some more of the less time consuming > testing. SCHED_4BSD, no dnetc, -j32 buildworld buildkernel : > > > buildworld: > > -------------------------------------------------------------- > ... World build completed on Wed Mar 22 19:16:35 PDT 2023 > ... World built in 1235 seconds, ncpu: 32, make -j32 > -------------------------------------------------------------- > > (compared to 1240 for SCHED_ULE) > > So: no significant difference. > > > buildkernel (SCHED_4BSD building a SCHED_4BSD): > > -------------------------------------------------------------- > ... Kernel build for GENERIC-NODBG-SCHED_4BSD completed on Wed Mar 22 19:18:34 PDT 2023 > -------------------------------------------------------------- > ... Kernel(s) GENERIC-NODBG-SCHED_4BSD built in 119 seconds, ncpu: 32, make -j32 > -------------------------------------------------------------- > > (compared to 118 for SCHED_ULE building a SCHED_ULE) > > So: no significant difference. I again forgot to show MaxObs load averages (for the above): MaxObs: 39.23, 31.58, 24.30 > I'll try it with dnetc also active. > I still have no good indication of dnetc progress to allow comparison of the combination. So the below focuses on buildworld buildkernel . I expect that the comparative results suggest a buildworld/buildkernel vs. dnetc progress tradeoff, not that I can well quantify it. The below are with dnetc also active. load averages, MaxObs: 73.03, 65.48, 56.30 (I remembered this time!) buildworld: -------------------------------------------------------------- ... World build completed on Wed Mar 22 20:15:56 PDT 2023 ... World built in 1667 seconds, ncpu: 32, make -j32 -------------------------------------------------------------- (compared to 2615 for SCHED_ULE with dnetc and to 1240 or so for no dnetc) buildkernel: -------------------------------------------------------------- ... Kernel build for GENERIC-NODBG-SCHED_4BSD completed on Wed Mar 22 20:18:34 PDT 2023 -------------------------------------------------------------- ... Kernel(s) GENERIC-NODBG-SCHED_4BSD built in 158 seconds, ncpu: 32, make -j32 -------------------------------------------------------------- (compared to 311 for SCHED_ULE with dnetc and to 118 or so for no dnetc) With dnetc active, it does not take being near -j1 (or no -j) for buildworld buildkernel to take noticably less time: -j32 (the number of hardware threads, 16 cores) also takes noticeably less time. buildworld buildkernel in this context seems to be a good match to SCHED_4BSD and its round-robin. (I make no general claim to SCHED_4BSD being better across a large range of contexts.) I've not decided if I'll try anything like a -j1 or no -j alternative. Without dnetc active, SCHED_ULE and SCHED_4BSD did not make much of a distinction. For how I use the builder machines, the scheduler choice is not suggested to be significant for my system-build activities. I've not tested port building in poudriere-devel for how I configure such. But nothing suggests to me to expect a significant distinction between the 2 schedulers for my way of working for building packages from ports. === Mark Millard marklmi at yahoo.com