From nobody Fri Jan 10 15:04:00 2025 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YV4g52k1bz5kTZH; Fri, 10 Jan 2025 15:04:01 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4YV4g46c2rz4x5R; Fri, 10 Jan 2025 15:04:00 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1736521441; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=QTI3dnbFcCzfb+Gg2dAJ1cGipInXN/pXN3iEp/mOCE4=; b=ReBNrItrxbM8XMsPps6CUoC6GXtoR2UxJihlXki7W5fAfW4zRiKYzbr0aEuncpSExZ2bky zVpoT94QWY8aMbxNkV0fcaB0/Tq70OMqszpOBPM3YlRH/6FRgcF+Kcp/R5KJFDJYexyPC9 D4a0Q1WDDb97e00WhHtKZ3tsvX1D8jjbbCo/rDYeUJ41pwe1UjBL+GGTbLDyT7qU+KX4Sp d3ZE9o3kC9hvTZJzjp6hSAyB136M/7KU5ugaF8GJGuBUjk6bSQSL/6vwHU67oGDrN3nTMn AOj4Dy9Vu7I4w92wbWtjK4OzZDFNGZmIReNejbZFwGt1nKMEkF0cqon0r/S6lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1736521441; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=QTI3dnbFcCzfb+Gg2dAJ1cGipInXN/pXN3iEp/mOCE4=; b=DhBE6KxdoG+IiQJa8xktTrke0cMsteAsAhYWKM0bKuT8TvAFPLilLiJA/5Nxab8eg4sY3t 4O615IiWILxap9b3Gnk8oGaB8UVP7AZaBDc4fXYMeYq45aXQcesOhNTJ4iZkJ9iuMCOuDn Yuc2h/AZf31iUJKnX5oqQptEm0vul/Xck0GmOoqncquDLoDfNics7UyOHlRnCC196YSyvf sp8yKCWwwgP3i84Q7CTKrE5ZJNFrtL/Hyp11mf/ANiqCXujo5AllWKJcIb/EZnmcZwa49N /8IjsVYbq/n+wpvTWPpLf77vG2Utlw4G/6xlnzQKQVoH4g8SzRq41qZi7e9yFw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1736521441; a=rsa-sha256; cv=none; b=HsMoP/w2NpNmsMcOZIdLA76kD/698StgbFOQ+aaXoH2uxHvGM9F05wPzngc/iWcGKF728x C8VOybq5SYL/QvHlWrp2OLVuc7KSFshJAME4B3Gkd19+sqr0PvK/k/xA9JP1F/7dRPiLqP lVHOWZ80gJpVTegebAhnAtObx17etu93N8PwE6hQ8uy44wP34nlr787XHTSWexLMdGgAY1 HkodWy6fsiFMHuDpnnI26TdOcqcPCkeyvZW+igtHZhR+nyGIsaRJL3VfHmZ8wivno042Xi 5lATgoXnuFHEYJvkYvdiK2dFWPfQwkmSyTyv7ZfZ5ST3cTfDa+sZz8LLZTQFSQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4YV4g467Bwz2Jk; Fri, 10 Jan 2025 15:04:00 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 50AF40Mv057414; Fri, 10 Jan 2025 15:04:00 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 50AF40mI057411; Fri, 10 Jan 2025 15:04:00 GMT (envelope-from git) Date: Fri, 10 Jan 2025 15:04:00 GMT Message-Id: <202501101504.50AF40mI057411@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Robert Clausecker Subject: git: 3863fec1ce2d - main - lib/libc/aarch64/string: add strlen SIMD implementation List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: fuz X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 3863fec1ce2dc6033f094a085118605ea89db9e2 Auto-Submitted: auto-generated The branch main has been updated by fuz: URL: https://cgit.FreeBSD.org/src/commit/?id=3863fec1ce2dc6033f094a085118605ea89db9e2 commit 3863fec1ce2dc6033f094a085118605ea89db9e2 Author: Getz Mikalsen AuthorDate: 2024-08-26 19:54:32 +0000 Commit: Robert Clausecker CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add strlen SIMD implementation Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hoped for on cores like the Graviton3 when compared to the existing implementation from Arm Optimized Routines. See the DR for bechmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45623 --- lib/libc/aarch64/string/Makefile.inc | 4 ++-- lib/libc/aarch64/string/strlen.S | 46 ++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 2 deletions(-) diff --git a/lib/libc/aarch64/string/Makefile.inc b/lib/libc/aarch64/string/Makefile.inc index f8c67319fe12..7325b54d9716 100644 --- a/lib/libc/aarch64/string/Makefile.inc +++ b/lib/libc/aarch64/string/Makefile.inc @@ -14,7 +14,6 @@ AARCH64_STRING_FUNCS= \ strchr \ strchrnul \ strcpy \ - strlen \ strnlen \ strrchr @@ -30,7 +29,8 @@ MDSRCS+= \ strncmp.S \ memccpy.S \ strncat.c \ - strlcat.c + strlcat.c \ + strlen.S # # Add the above functions. Generate an asm file that includes the needed diff --git a/lib/libc/aarch64/string/strlen.S b/lib/libc/aarch64/string/strlen.S new file mode 100644 index 000000000000..7bfac7f4b1e1 --- /dev/null +++ b/lib/libc/aarch64/string/strlen.S @@ -0,0 +1,46 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause + * + * Copyright (c) 2024 Getz Mikalsen +*/ + +#include + + .weak strlen + .set strlen, __strlen + .text + +ENTRY(__strlen) + bic x10, x0, #0xf // aligned src + and x9, x0, #0xf + ldr q0, [x10] + cmeq v0.16b, v0.16b, #0 + shrn v0.8b, v0.8h, #4 + fmov x1, d0 + cbz x9, .Laligned + lsl x2, x0, #2 // get the byte offset + lsr x1, x1, x2 // shift by offset index + cbz x1, .Lloop + rbit x1, x1 + clz x0, x1 + lsr x0, x0, #2 + ret + +.Laligned: + cbnz x1, .Ldone + +.Lloop: + ldr q0, [x10, #16]! + cmeq v0.16b, v0.16b, #0 + shrn v0.8b, v0.8h, #4 // reduce to fit mask in GPR + fcmp d0, #0.0 + b.eq .Lloop + fmov x1, d0 +.Ldone: + sub x0, x10, x0 + rbit x1, x1 // reverse bits as NEON has no ctz + clz x3, x1 + lsr x3, x3, #2 + add x0, x0, x3 + ret +END(__strlen)