From nobody Thu Aug 03 22:55:50 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RH42H25TZz4V9DY; Thu, 3 Aug 2023 22:55:51 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RH42H0Qy9z3gml; Thu, 3 Aug 2023 22:55:51 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1691103351; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=zH2IbkPVHgGJch1ASU0gf3whyU8Kv4ElyWQeCwaBHSI=; b=jtSDafJoIUmsPoXEZdZvGw6czAeQdHs8VgfEG3Nz6GzkdSvjEv+Bv4oTzArGt+3Dsa76nh 31NPdYBrgMlkMfxBz315o6tO5tXbZkNVgV5hVUkO05KMpB3yvG0BWQcfxiGsmNoBgjNKt+ 8vUHvhQj61SP0OZdQIEZgpw2ciaqg1t5/BDHkT/sN9z5mBMbuLRtumZwAi5QO1mWWFG75+ GHrMYp3xjv3ZNeVZozRCXCGCZPJpq8C8CXe4FrFCzQqof0CIvup73Nvnlt0cZ5jCTiVrM+ T3Z9FEdUPbkCChUGI4LVAraWJkBKzUZXeO1Yd6gP93+hpS50SsDbXIuqaymoqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1691103351; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=zH2IbkPVHgGJch1ASU0gf3whyU8Kv4ElyWQeCwaBHSI=; b=P5yuZF6jMmpTWWgO8INe4dCq/6W90lcRWyrCvU4ChdkOzfo6DNk3E1U0woCQjESx565kVV c+yU81HA4vOSzFOE4dzC+imxTQD3mJwBPusFWlnuVPXWfX6+qrdMQJB8JIMAhqbxhjEBvF x0X830Gz1mQpArwEDL7fRJVGqjQs0ONtlACfZT49O6lz9/nz3t+srJwQFVOyliTrWW2Uyd +fiZT8hs1RAaDbTZK41YjFDqHmIYSOnyrRFDqhXzwiSLVB8VG9ZVrroZ6CqhekEbTSonYU q1HZK7j6rFrKJyAAoQNqHIoN9zA7rlHYTeMtq4W76SQFBWHs6FUNG1go5VTLlg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1691103351; a=rsa-sha256; cv=none; b=bzw7f/WD1ITtVg2be+1ybdyULzugmZGHRpnIFAP54x/Ix5FxpmYlzDGK9GiymAdSvJ8Yie Fpqr0SgVwKXv8SPbQ//gu4R+dB2PZ8ykZqsFvHFbO+QZN/2j2pKs5oSqk31TIk2zFt5YKX oHGfPqnmVh3DCkbXZJcLyG/MPBv5jbooEWbUIQ55bdmJip8vkC8bAA4LLzl7/2VQqWIq7Z 8Lr1W03RSKMnd0/DtxYH7GEF6afodloE3Bju00M6nU82AaUGPkYXJpVT330FiA53pW0lRS QbC6RI9tz7LFh8M6HK53g7ClzYc7tFiNdIjguLqMe67NlRqmyuRgNYGwift9pA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4RH42G6NwJzfmX; Thu, 3 Aug 2023 22:55:50 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 373MtoKj072031; Thu, 3 Aug 2023 22:55:50 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 373Mtolv072030; Thu, 3 Aug 2023 22:55:50 GMT (envelope-from git) Date: Thu, 3 Aug 2023 22:55:50 GMT Message-Id: <202308032255.373Mtolv072030@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Robert Clausecker Subject: git: d8385768fb12 - main - lib/libc/amd64/string/strlen.S: add amd64 baseline kernel List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: fuz X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: d8385768fb12e6205d73a20ad05fba9f3281b6e1 Auto-Submitted: auto-generated The branch main has been updated by fuz: URL: https://cgit.FreeBSD.org/src/commit/?id=d8385768fb12e6205d73a20ad05fba9f3281b6e1 commit d8385768fb12e6205d73a20ad05fba9f3281b6e1 Author: Robert Clausecker AuthorDate: 2023-08-03 22:48:32 +0000 Commit: Robert Clausecker CommitDate: 2023-08-03 22:54:23 +0000 lib/libc/amd64/string/strlen.S: add amd64 baseline kernel This performs very well. x86-64-v3 and x86-64-v4 kernels were written, too, but performed worse than the baseline kernel on short strings. These may be added at a future point in time if the performance issues can be fixed. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strlen_scalar.out │ strlen_baseline.out │ │ B/s │ B/s vs base │ Short 1.667Gi ± 1% 2.676Gi ± 1% +60.55% (p=0.000 n=20) Mid 5.459Gi ± 1% 8.756Gi ± 1% +60.39% (p=0.000 n=20) Long 15.34Gi ± 0% 52.27Gi ± 0% +240.64% (p=0.000 n=20) geomean 5.188Gi 10.70Gi +106.24% Sponsored by: The FreeBSD Foundation Approved by: kib Reviewed by: mjg jrtc27 Differential Revision: https://reviews.freebsd.org/D40693 --- lib/libc/amd64/string/strlen.S | 58 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/lib/libc/amd64/string/strlen.S b/lib/libc/amd64/string/strlen.S index 1d2428e3420e..7e2514de44b0 100644 --- a/lib/libc/amd64/string/strlen.S +++ b/lib/libc/amd64/string/strlen.S @@ -1,11 +1,18 @@ -/* +/*- * Written by Mateusz Guzik + * Copyright (c) 2023 The FreeBSD Foundation + * + * Portions of this software were developed by Robert Clausecker + * under sponsorship from the FreeBSD Foundation. + * * Public domain. */ #include __FBSDID("$FreeBSD$"); +#include "amd64_archlevel.h" + /* * Note: this routine was written with kernel use in mind (read: no simd), * it is only present in userspace as a temporary measure until something @@ -14,6 +21,11 @@ __FBSDID("$FreeBSD$"); #define ALIGN_TEXT .p2align 4,0x90 /* 16-byte alignment, nop filled */ +ARCHFUNCS(strlen) + ARCHFUNC(strlen, scalar) + ARCHFUNC(strlen, baseline) +ENDARCHFUNCS(strlen) + /* * strlen(string) * %rdi @@ -30,7 +42,7 @@ __FBSDID("$FreeBSD$"); * * The latter contains a 32-bit variant of the same algorithm coded in assembly for i386. */ -ENTRY(strlen) +ARCHENTRY(strlen, scalar) movabsq $0xfefefefefefefeff,%r8 movabsq $0x8080808080808080,%r9 @@ -76,6 +88,46 @@ ENTRY(strlen) leaq (%rcx,%rdi),%rax subq %r10,%rax ret -END(strlen) +ARCHEND(strlen, scalar) + +ARCHENTRY(strlen, baseline) + mov %rdi, %rcx + pxor %xmm1, %xmm1 + and $~0xf, %rdi # align string + pcmpeqb (%rdi), %xmm1 # compare head (with junk before string) + mov %rcx, %rsi # string pointer copy for later + and $0xf, %ecx # amount of bytes rdi is past 16 byte alignment + pmovmskb %xmm1, %eax + add $32, %rdi # advance to next iteration + shr %cl, %eax # clear out matches in junk bytes + test %eax, %eax # any match? (can't use ZF from SHR as CL=0 is possible) + jnz 2f + + ALIGN_TEXT +1: pxor %xmm1, %xmm1 + pcmpeqb -16(%rdi), %xmm1 # find NUL bytes + pmovmskb %xmm1, %eax + test %eax, %eax # were any NUL bytes present? + jnz 3f + + /* the same unrolled once more */ + pxor %xmm1, %xmm1 + pcmpeqb (%rdi), %xmm1 + pmovmskb %xmm1, %eax + add $32, %rdi # advance to next iteration + test %eax, %eax + jz 1b + + /* match found in loop body */ + sub $16, %rdi # undo half the advancement +3: tzcnt %eax, %eax # find the first NUL byte + sub %rsi, %rdi # string length until beginning of (%rdi) + lea -16(%rdi, %rax, 1), %rax # that plus loc. of NUL byte: full string length + ret + + /* match found in head */ +2: tzcnt %eax, %eax # compute string length + ret +ARCHEND(strlen, baseline) .section .note.GNU-stack,"",%progbits