Re: Too many pythons in poudriere
- Reply: bob prohaska : "Re: Too many pythons in poudriere"
- In reply to: Mark Millard via freebsd-ports : "Re: Too many pythons in poudriere"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 09 Jul 2021 19:46:47 UTC
On 2021-Jul-8, at 14:11, Mark Millard <marklmi at yahoo.com> wrote: > On 2021-Jul-8, at 10:54, bob prohaska <fbsd at www.zefox.net> wrote: > >> On Thu, Jul 08, 2021 at 09:41:13AM -0700, Mark Millard wrote: >>> >>> >>> On 2021-Jul-8, at 08:45, bob prohaska <fbsd at www.zefox.net> wrote: >>> >>>> Even with -J1 and no ALLOW_MAKE_JOBS I'm still >>>> seeing five pythons occupying at least 3 GB on >>>> the loose. >>> >>> Actually I just looked and saw: >>> >>> Swapinfo 7.36% >>> >>> (Unlike the 83% or so I saw somewhat around 3 hours ago.) >>> >>> Load Averages (220%) 2.20 2.18 1.76 >>> >>> Elapsed 12:54:56 >>> >>> I do not see a swaplog in http://www.zefox.org/~bob/swaplogs/ >>> to look at. So I can not see how much the peak swap space >>> usage was so far (approximately). >> >> Started a new swaplog: >> http://www.zefox.org/~bob/swaplogs/202107182930.log >> >> It came within a whisker of running out of swap, then abruptly >> the python threads vanished and the build seems to be proceeding. > > Did you update any thing, such as /usr/ports/ , between the 50+ hr > run and the new one? The new one spent over 6 hours at: > > [05:50:59] [ 8% 3914/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py gen/third_party/blink/renderer/modules/event_interface_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules > [12:22:32] [ 8% 3915/47953] python ../../third_party/blink/renderer/bindings/scripts/generate_bindings.py --web_idl_database gen/third_party/blink/renderer/bindings/web_idl_database.pickle --root_src_dir ../../ --root_gen_dir gen --output_core_reldir third_party/blink/renderer/bindings/core/v8/ --output_modules_reldir third_party/blink/renderer/bindings/modules/v8/ enumeration callback_function callback_interface interface namespace typedef union > [12:22:36] [ 8% 3916/47953] touch obj/third_party/blink/renderer/bindings/generate_bindings_all.stamp > [12:22:39] [ 8% 3917/47953] touch obj/third_party/blink/renderer/bindings/modules/event_modules_names.stamp > [12:22:42] [ 8% 3918/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py ../../third_party/blink/renderer/modules/event_target_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules > [12:22:42] [ 8% 3919/47953] touch obj/third_party/blink/renderer/bindings/modules/event_target_modules_names.stamp > > > The 50+ hour one did not: > > [03:56:05] [ 8% 3848/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py gen/third_party/blink/renderer/modules/event_interface_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules > [03:56:05] [ 8% 3849/47953] touch obj/third_party/blink/renderer/bindings/modules/event_modules_names.stamp > [03:56:06] [ 8% 3850/47953] python ../../third_party/blink/renderer/bindings/scripts/collect_idl_files.py --idl_list_file __third_party_blink_renderer_bindings_web_idl_in_core_for_testing___build_toolchain_linux_clang_arm64__rule.rsp --component core --output gen/third_party/blink/renderer/bindings/web_idl_in_core_for_testing.pickle --for_testing > [03:56:06] [ 8% 3851/47953] touch obj/third_party/blink/renderer/bindings/web_idl_in_core_for_testing.stamp > [03:56:06] [ 8% 3852/47953] touch obj/third_party/blink/renderer/bindings/scripts/cached_jinja_templates.stamp > [03:56:06] [ 8% 3853/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py ../../third_party/blink/renderer/modules/event_target_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules > [03:56:09] [ 8% 3854/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/core/style/make_computed_style_initial_values.py ../../third_party/blink/renderer/core/css/css_properties.json5 ../../third_party/blink/renderer/core/css/computed_style_field_aliases.json5 ../../third_party/blink/renderer/platform/runtime_enabled_features.json5 ../../third_party/blink/renderer/core/style/computed_style_extra_fields.json5 --output_dir gen/third_party/blink/renderer/core/style --gperf gperf > [03:56:09] [ 8% 3855/47953] touch obj/third_party/blink/renderer/bindings/modules/event_target_modules_names.stamp > > > The build step numbers are different for the same > command: > > 3914/47953 > vs. > 3848/47953 > > (But I do not know if the build technique tries to > keep the partial ordering for build steps stable > across build attempts from the similar starting > conditions.) > > It almost seems like like a system level check in what > process(s) have large amounts swap space in use would > be appropriate when this sort of thing happens, not > that I've done such before. My understanding is that > top's per-process reporting of swap space usage is > problematical when the display is set to display such > information. Going in different direction, when you updated: /usr/local/poudriere/poudriere-system to have Jail OSVERSION: 1400024 (and probably matching the host OS main-n247590-5dd84e315a9), matching /usr/src/ as well, did you rebuild all your ports under the coherent combination? Or are you still using port builds in poudriere that were built with the jail OSVERSION being older than the /usr/src/ the builds were based on? Just like devel/llvm10 had a command that only-sometimes had problems when built via the incoherent combination, other ports could have odd behavior --but only sometimes. My guess is that you have not done such because it would have taken far more time than I've noticed in your activities. I recommended before doing such a rebuild of the ports and then a reinstall of them. Given the odd behaviors being observed that do not repeat each time tried, I still make the recommendation. >> I'm curious if this was blind luck, or some adaptive behavior >> by poudrirere. > > Poudriere does not control chrome's internel build steps. > The notation like [ 8% 3849/47953] is not from poudriere. > >> One other oddity: occasionally one see in top a >> PID using more than 100% WCPU. Is one thread occupying two cores? > > A process can have multiple threads instead of just one > but each thread runs on at most one cpu (core here) at a > time. > > Top can do either of: > > A) show a count of threads in each process shown > vs. > B) show a line for each thread instead of for each > process > > Top can also display the unique thread number instead > of the process number (shared by all threads in the > process). (But not both at the same time.) Showing > thread numbers is probably only commonly selected when > (B) is in use. > > In (B) mode, if the process number is being shown, > then there will be one example of the number per > thread shown that is from the process. > > But I'll note that, if I remember right, python is > a single threaded language. It would probably take > use of language bindings to other langauges for > multiple threads to be involved (indirectly) in a > python process. > >>> >>>> I'm fairly sure this didn't happen >>>> when using make by itself (IIRC it was -j2). > > It did not happen at that point in the 50+ hr run > either. > >>>> I also got rid of the mistaken directive in >>>> poudriere.d/make.conf. >>> >>> When I look at http://www.zefox.org/~bob/poudriere.d/make.conf >>> now I see: >>> >>> ALLOW_MAKE_JOBS=yes >>> #MAKE_JOBS_NUMBER=2 >>> #.if ${.CURDIR:M*www/chromium} >>> #MAKE_JOBS_NUMBER_LIMIT=2 >>> #.endif >>> #.if ${.CURDIR:M*databases/sqlite3} >>> #MAKE_JOBS_NUMBER_LIMIT=2 >>> #.endif >>> #.if ${.CURDIR:M*www/firefox} >>> #MAKE_JOBS_NUMBER_LIMIT=2 >>> #.endif >>> >>> which does not match your wording. >>> >> >> Thank you for catching my error. _now_ it's fixed. >> >> [snip] >>> To see what is getting CPU time that leads to >>> the load averages being around 2 might take >>> using something like top sorted by cpu time >>> and watching for a while. >>> >>>> There is a >>>> #MAX_MEMORY=8 >>>> in poudriere.conf, presumably GB. >>> >>> Documented as GiB: >>> >>> # How much memory to limit jail processes to for *each builder* >>> # in GiB (default: none) >>> #MAX_MEMORY=8 >>> >>> Per builder, not per-make-process. >>> Within a builder each make-process shares >>> that size space with the others. >>> >>>> That >>>> looks like a good knob to play with. Would >>>> setting it to something like 3 or 4 help? >>> >>> If the memory use exceeds what you set, the builder >>> process is likely killed. >> [snip] >> >> I was hopeful it might inhibit starting new PIDs >> when memory/swap is below some threshold. Guess not. > > poudriere does not control the internals of the > builder's attempted operations beyond what the > port supports. (And there are not port-specific > interfaces for such control. poudriere treats > ports the same way in general.) > === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)