From nobody Sun Jul 23 13:39:15 2023 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R84C96QxMz4pSDd for ; Sun, 23 Jul 2023 13:39:17 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4R84C924Bsz3R6G for ; Sun, 23 Jul 2023 13:39:17 +0000 (UTC) (envelope-from freebsd@omnilan.de) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of freebsd@omnilan.de designates 2a00:e10:2800::a130 as permitted sender) smtp.mailfrom=freebsd@omnilan.de; dmarc=none Received: from mh0.gentlemail.de (mh0.gentlemail.de [IPv6:2a00:e10:2800:0:0:0:0:a135]) by mx0.gentlemail.de (8.15.2/8.15.2) with ESMTP id 36NDdGRZ058436 for ; Sun, 23 Jul 2023 15:39:16 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id E63E4F03 for ; Sun, 23 Jul 2023 15:39:15 +0200 (CEST) From: Harry Schmalzbauer Subject: cpuset(1) and affinity vs. masking To: freebsd-arm@freebsd.org Organization: OmniLAN Message-ID: Date: Sun, 23 Jul 2023 15:39:15 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spamd-Result: default: False [-3.30 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+mx:c]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-arm@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:61157, ipnet:2a00:e10:2800::/38, country:DE]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[omnilan.de]; HAS_ORG_HEADER(0.00)[]; RCVD_TLS_LAST(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arm@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4R84C924Bsz3R6G X-Spamd-Bar: --- Hello, I hope it is ok asking some very basic questions here on -arm, where probably the more arch/dev-related arm topics are discussed usually.  I found many posts where people (asking like responding ones) were confusing NUMA/malloc and scheduler related questions/issues, and aarch64/amd64 likewise...  That's why I tend to ask here 1st. I'm on rk3399 (Pine64 RockPro64) and new to aarch64 and big.little with FreeBSD and I'm just looking for a way to teach the scheduler (default ULE) to prefer the fast cores. As far as I understood cpuset(1), it can only mask cores to be exluded. What I can observe is that sched_ule distributes threads across all 6 cores without any notable affinity - often single-thread tasks are spread over cpu 0-3 (slow A53) while the fast cores 4+5 (more power-hungry A72) are idle. As long as power consumption isn't crucial, this default behaviour can't be intentional.  A simple realworld test utilizing xz/pkg(1) shows close to 100% performance penalty with the out-of-box sched_ule behaviour: ( time pkg -o ABI_FILE=/usr/src/worldstage/usr/bin/uname -o ALLOW_BASE_SHLIBS=yes  create -f txz -M /usr/src/worldstage/openssl-dev.ucl  - p /usr/src/worldstage/openssl-dev.plist  -r /usr/src/worldstage -o /usr/obj/usr/src/repo/FreeBSD:13:aarch64/13.snap20230723104317 ) Invoked from 'cpuset -c -l 4,5 /bin/sh'  results in    63.38 real        63.09 user         0.28 sys while invoked from 'cpuset -c -l 0-3 /bin/sh'  results in    118.58 real       118.05 user         0.52 sys   (BTW, regarding power consumption: I can hardly imagine running 2 minutes on A53 cores safes power compared to running half the time on the A72 cores, but that's a totally different story for me for now) I'm looking for real affinity.  Meaning, every core is allowed, but the fat ones are preferred i.e always used until all of them are overloaded - and re-assigend immediately - fat-core cycles must'nt ever belong to idle as long as slow-cores are utilized.. What I found so far regarding rk3399 tuning: https://lists.freebsd.org/pipermail/freebsd-arm/2020-July/022105.html, which is smore about cpufreq(1) - still an issue in my opinion, but since     sysctl dev.cpu.4.freq=1800     sysctl dev.cpu.3.freq=1416 works these days, it doesn't bother much. In another discussion, there was a reference to FDT cpu-map posted: https://mjmwired.net/kernel/Documentation/devicetree/bindings/cpu On my Rockpro64, kern.sched.topology_spec doesn't seem to define two groups/clusters:     0, 1, 2, 3, 4, 5   But dmesg shows traces of affinity groups, since there's something like [ 0 0, 0 1, 0 2, 0 3] and [ 1 0, 1 1] printed next to CPU affinity... I simply don't understand how sched_ule is supposed to make use of this information.  I guess it doesn't (yet). CPU  0: ARM Cortex-A53 r0p4 affinity:  0  0                    Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG>  Instruction Set Attributes 0 =  Instruction Set Attributes 1 = <>  Instruction Set Attributes 2 = <>          Processor Features 0 =          Processor Features 1 = <>       Memory Model Features 0 =       Memory Model Features 1 = <8bit VMID>       Memory Model Features 2 = <32bit CCIDX,48bit VA>              Debug Features 0 =              Debug Features 1 = <>          Auxiliary Features 0 = <>          Auxiliary Features 1 = <> AArch32 Instruction Set Attributes 5 = AArch32 Media and VFP Features 0 = AArch32 Media and VFP Features 1 = CPU  1: ARM Cortex-A53 r0p4 affinity:  0  1 CPU  2: ARM Cortex-A53 r0p4 affinity:  0  2 CPU  3: ARM Cortex-A53 r0p4 affinity:  0  3 CPU  4: ARM Cortex-A72 r0p2 affinity:  1  0                    Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG>       Memory Model Features 0 = CPU  5: ARM Cortex-A72 r0p2 affinity:  1  1 The cpuset(1) policy:domina-list affects malloc only, as far is I understand... Any hints for more resources (besides /usr/src0 highly appreciated! Thanks in advance, -harry