From nobody Mon Dec 25 14:26:06 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SzKvf5DNnz54n8c; Mon, 25 Dec 2023 14:26:06 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SzKvf3nttz3NN1; Mon, 25 Dec 2023 14:26:06 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1703514366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=PkFn0SK5lo3QzRMk+xeUpB1rJRutEHsGceWs2maGZqM=; b=Z3RCMJqs4WS8q3oNScWiu4AeO6Et1vtY9k+oZOMhsE+/AsXUDdpPNKU/Z17Q7YXefq+JyK OER8aoJ5NPRUP4pFkvL5jmz2km95kq4z72FYQzXlvh2eJll5omlFeHo6oSNbWzrfaThJNu yiRa45Gu7FWjcSJ9FIOJ21ufDlAxedFoadfbDWFk9YA/OJljWZUQvOueR8sGKgcC6JDB/G fUhlFRm+8fAv9Eyw0/snLv8G+WaXSwceEy4YD6r/V//o6IuXV08fRVkbcUpu/o32rwVRLX 8yoIe/qPFxue0iuOsCbI6DK0fojq03nebWRVfsN/IVg31Cn/DJ3NsFHpIcNiMA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1703514366; a=rsa-sha256; cv=none; b=AkqXVBLHH5dwvCTuJ49EaHhzJqW8FJAcnINS5X0ENuDvlo5ouMwYi//Mhn+TrgBQoD3ne7 pdTHw8XwuOvNfj0Mf2JfpdI+2HTtJ9e0aU85vcPLKRNhlLcanE8gmWNxTPosKKv97JVf4l jtbVxO4+VIIR8EEQPOvQDkD5m69K/RGNYkssTH4LZGPC6F6hUek5HL1kdyWG3CA5iJlL4h a+phKjkEv/VJDuaV3pV8aJzHKu/w2x1L1AXd+eHXkF1upfnX/LcKvsshY10nJTMwU7X2yi Vg0FDv2CsEuONa7uselaDUXJ5jDk05lrFfKwyfsVDr6t4tq6mX3vwB3vvlg6VQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1703514366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=PkFn0SK5lo3QzRMk+xeUpB1rJRutEHsGceWs2maGZqM=; b=XwTFhWrKgFW7nFjkRIiaKyApz+YiJmFC09QzDlDxkfW7Yw531/he0zDVxzJFJ5am/5JQ2G YjOu+gqmheCORA701wzmGGGG7913Uu14HKYhLxxX2MbeHCdqmEvSKR7DjUJaV4wCxa9XkK dtF9jmbk00ecGVMO8u247rDMg+wwAvInj+XQXbIsCjrwn6JHRlTD1ZIvAo4jhHyaxZLkTC ZKaQY36WLBK16W7AXc6OeK9XmNJiyJEJvQmfsyK9hm4kdx29U4HHy+l/rlb0a45M43kzym lOMa79P2yGg/c2kNyXb+KlTyb7mcGr6xvnqLwj7GZSDPxPDaSd8trJA4tTnssA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4SzKvf2v7szSP; Mon, 25 Dec 2023 14:26:06 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 3BPEQ6Ei024030; Mon, 25 Dec 2023 14:26:06 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 3BPEQ6g5024027; Mon, 25 Dec 2023 14:26:06 GMT (envelope-from git) Date: Mon, 25 Dec 2023 14:26:06 GMT Message-Id: <202312251426.3BPEQ6g5024027@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Robert Clausecker Subject: git: fb197a4f7751 - main - lib/libc/amd64/string: add memrchr() scalar, baseline implementation List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: fuz X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: fb197a4f7751bb4e116989e57ba7fb12a981895f Auto-Submitted: auto-generated The branch main has been updated by fuz: URL: https://cgit.FreeBSD.org/src/commit/?id=fb197a4f7751bb4e116989e57ba7fb12a981895f commit fb197a4f7751bb4e116989e57ba7fb12a981895f Author: Robert Clausecker AuthorDate: 2023-12-06 10:05:47 +0000 Commit: Robert Clausecker CommitDate: 2023-12-25 14:00:05 +0000 lib/libc/amd64/string: add memrchr() scalar, baseline implementation The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be improved by using the same algorithm as for memchr, but it would have been a lot more complicated. The baseline implementation is similar to timingsafe_memcmp. It's slightly slower than memchr() due to the more complicated main loop, but I don't think that can be significantly improved. Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42925 --- lib/libc/amd64/string/Makefile.inc | 1 + lib/libc/amd64/string/memrchr.S | 166 +++++++++++++++++++++++++++++++++++++ 2 files changed, 167 insertions(+) diff --git a/lib/libc/amd64/string/Makefile.inc b/lib/libc/amd64/string/Makefile.inc index a14e8a768f01..b1369841bc74 100644 --- a/lib/libc/amd64/string/Makefile.inc +++ b/lib/libc/amd64/string/Makefile.inc @@ -6,6 +6,7 @@ MDSRCS+= \ memccpy.S \ memcpy.S \ memmove.S \ + memrchr.S \ memset.S \ stpcpy.S \ stpncpy.S \ diff --git a/lib/libc/amd64/string/memrchr.S b/lib/libc/amd64/string/memrchr.S new file mode 100644 index 000000000000..4f6c5a238daa --- /dev/null +++ b/lib/libc/amd64/string/memrchr.S @@ -0,0 +1,166 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause + * + * Copyright (c) 2023 Robert Clausecker + */ + +#include + +#include "amd64_archlevel.h" + +#define ALIGN_TEXT .p2align 4, 0x90 + +ARCHFUNCS(memrchr) + ARCHFUNC(memrchr, scalar) + ARCHFUNC(memrchr, baseline) +ENDARCHFUNCS(memrchr) + +ARCHENTRY(memrchr, scalar) + xor %eax, %eax # prospective return value + sub $4, %rdx # 4 bytes left to process? + jb 1f + + ALIGN_TEXT +0: xor %r8, %r8 + lea 2(%rdi), %r10 + cmp %sil, 2(%rdi) + cmovne %r8, %r10 # point to null if no match + + cmp %sil, (%rdi) + cmove %rdi, %r8 # point to first char if match + + lea 1(%rdi), %r9 + cmp %sil, 1(%rdi) + cmovne %r8, %r9 # point to first result if no match in second + + lea 3(%rdi), %r11 + cmp %sil, 3(%rdi) + cmovne %r10, %r11 + + test %r11, %r11 + cmovz %r9, %r11 # take first pair match if none in second + + test %r11, %r11 + cmovnz %r11, %rax # take match in current set if any + + add $4, %rdi + sub $4, %rdx + jae 0b + +1: cmp $-3, %edx # a least one character left to process? + jb 2f + + cmp %sil, (%rdi) + cmove %rdi, %rax + + lea 1(%rdi), %rcx + cmp $-2, %edx # at least two characters left to process? + jb 2f + + cmp %sil, 1(%rdi) + cmove %rcx, %rax + + lea 2(%rdi), %rcx + cmp $-1, %edx # at least three character left to process? + jb 2f + + cmp %sil, 2(%rdi) + cmove %rcx, %rax + +2: ret +ARCHEND(memrchr, scalar) + +ARCHENTRY(memrchr, baseline) + movd %esi, %xmm4 + test %rdx, %rdx # empty buffer? + jz .L0 # if yes, return immediately + + punpcklbw %xmm4, %xmm4 # c -> cc + mov %edi, %ecx + punpcklwd %xmm4, %xmm4 # cc -> cccc + and $~0xf, %rdi # align source pointer + pshufd $0, %xmm4, %xmm4 # cccc -> cccccccccccccccc + and $0xf, %ecx + movdqa %xmm4, %xmm0 + mov $-1, %r8d + pcmpeqb (%rdi), %xmm0 # compare aligned head + shl %cl, %r8d # mask of bytes in the head of the buffer + pmovmskb %xmm0, %eax + + sub $16, %rcx + and %r8d, %eax # match mask + add %rcx, %rdx # advance past head + cmc + jbe .Lrunt # did the string end in the buffer? + + mov %rdi, %rsi # pointer to matching chunk + add $16, %rdi + sub $16, %rdx # enough left for another round? + jbe 1f + + /* main loop unrolled twice */ + ALIGN_TEXT +0: movdqa %xmm4, %xmm0 + pcmpeqb (%rdi), %xmm0 + pmovmskb %xmm0, %r8d + + cmp $16, %rdx # enough left for second chunk? + jbe 2f + + movdqa %xmm4, %xmm0 + pcmpeqb 16(%rdi), %xmm0 + pmovmskb %xmm0, %ecx + + lea 16(%rdi), %r9 + test %ecx, %ecx # match found in second chunk? + cmovz %r8d, %ecx # if not, use match data from first chunk + cmovz %rdi, %r9 + + test %ecx, %ecx # any match found? + cmovnz %ecx, %eax # if yes, overwrite previously found match + cmovnz %r9, %rsi + + add $32, %rdi # advance to next iteration + sub $32, %rdx # advance to next chunks + ja 0b + + /* process remaining 1--16 bytes */ +1: pcmpeqb (%rdi), %xmm4 + mov $0xffff, %r8d + xor %ecx, %ecx + sub %edx, %ecx # number of bytes to be masked out + pmovmskb %xmm4, %r9d + shr %cl, %r8d # mask of bytes to be kept in the buffer + and %r9d, %r8d + cmovnz %r8d, %eax + cmovnz %rdi, %rsi + bsr %eax, %eax + lea (%rsi, %rax, 1), %rsi # pointer to match (or junk) + cmovnz %rsi, %rax # if any match was found, return it + ret + + /* end of chunk reached within first half iteration */ +2: test %r8d, %r8d # match in previous chunk? + cmovnz %r8d, %eax # if yes, overwrite previous chunks + cmovnz %rdi, %rsi + add $16, %rdi # point to tail + sub $16, %edx + jmp 1b # handle tail the same otherwise + + /* runt: string ends within head, edx has negated amount of invalid head bytes */ +.Lrunt: mov $0xffff, %r8d + xor %ecx, %ecx + sub %edx, %ecx + shr %cl, %r8d + and %r8d, %eax + bsr %eax, %eax + lea (%rdi, %rax, 1), %rdi + cmovnz %rdi, %rax + ret + + /* empty buffer: return a null pointer */ +.L0: xor %eax, %eax + ret +ARCHEND(memrchr, baseline) + + .section .note.GNU-stack, "", %progbits