From nobody Wed Jan 24 19:45:00 2024 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TKvYm5X9Jz58bf6; Wed, 24 Jan 2024 19:45:00 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TKvYm37G0z46HY; Wed, 24 Jan 2024 19:45:00 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1706125500; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=SiJB/Q0x9aSaKSJuS7hfOH3dB4o9lLJjZoMqCclzisE=; b=tw/YJ9/Qq83gBoJ0l00o/ItQbB/fb7t9afb1BvXiQ9NEvJkT+QmQWzoIk1RwqPJm/IfQwC WvN4hPoBtHVCB6FWAHshU8NTyuthnin6NDAcV9Ox7jGNS9BBvTLbQkJXo0h6ZXVlxzqLyf 8NCZt6hH6pM8PcaUlSiBItWdU/OeN8xjXNumKq/M/C2TTqAnQQ4oMxbzbUKgSspl3YLb+e elWftrGA0v9vP3oUCbVGJltUg2UUka72stiVYpGEfG2g3Y4pip9m5K1wnWOsUvAXGowP72 Rg0s0A53KgkQWA5Cfx3FMgb7WWLdt7+ou3iNHKDkP0dxeP8WOrzyUSBGpjaYhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1706125500; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=SiJB/Q0x9aSaKSJuS7hfOH3dB4o9lLJjZoMqCclzisE=; b=NoJec3fAZV9CCmjStf38aQYI6eAXZTUTmqEEVCLk+V99bBxPZv2idhPlkiIPXwy7EyzWq5 Af6y9zJ5NVkngYjJcOcY99kzTDNgCYHAcG+t11r4hmU7oSG4di7pjuraM2nF+dCI3mNC83 EL8XgXUQ2Om5fIhvmxyt1UPvGB73YgTvcNvKcMRfLuZ62H7DJbMFAyxR2XOaYB0adk9Q0N VaaAW1QOui0nfISkmgV98wZZk3geRjz0+qm2vV90omc2xaA+vYfZjuU3RnCMnHqkzPTXSr ouMNlklkd4gxoJ7ozWWR3k022IlHL0XmoDSCiHBeCN44B3Tf30ZUjswyj6DM+Q== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1706125500; a=rsa-sha256; cv=none; b=Yso3o0n5jxQry5Nyzf9kR491JEcRyWs0FSzHlPMDHpI5IrY1vm7o14k5grXDRq1sO628gw Rx10fSgV9AHVzQIykXv2GH6u3NZF3NOWkvyFtz2kruQkSdrxAc28o3wQHDz+MPn/x9H3Ky DWr/ITsvTf4E2n7EV/V8hr/2sBwAwt7l5H2R5+Ny0XLKXzy5S7uozvK2UvNkyEiLKePHms GIzFsTyaDdYUeiB1pa9FFwsDlFFJIJ84zFsTHXYceBHO0EfvBfz6I1dG1Cnf7WZoWjKjNQ CsH2BMWu3Fk5AsQHA4jMjYhendFIRQDJ9UG4kMPZrmAGUo2fdZqR9MOOEG+c0w== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4TKvYm2CByzRgq; Wed, 24 Jan 2024 19:45:00 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 40OJj0GZ006129; Wed, 24 Jan 2024 19:45:00 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 40OJj03c006126; Wed, 24 Jan 2024 19:45:00 GMT (envelope-from git) Date: Wed, 24 Jan 2024 19:45:00 GMT Message-Id: <202401241945.40OJj03c006126@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Robert Clausecker Subject: git: ff7799e00311 - stable/14 - lib/libc/amd64/string: add memrchr() scalar, baseline implementation List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-branches@freebsd.org X-BeenThere: dev-commits-src-branches@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: fuz X-Git-Repository: src X-Git-Refname: refs/heads/stable/14 X-Git-Reftype: branch X-Git-Commit: ff7799e003110a8188bc129e663d23b54755531f Auto-Submitted: auto-generated The branch stable/14 has been updated by fuz: URL: https://cgit.FreeBSD.org/src/commit/?id=ff7799e003110a8188bc129e663d23b54755531f commit ff7799e003110a8188bc129e663d23b54755531f Author: Robert Clausecker AuthorDate: 2023-12-06 10:05:47 +0000 Commit: Robert Clausecker CommitDate: 2024-01-24 19:39:31 +0000 lib/libc/amd64/string: add memrchr() scalar, baseline implementation The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be improved by using the same algorithm as for memchr, but it would have been a lot more complicated. The baseline implementation is similar to timingsafe_memcmp. It's slightly slower than memchr() due to the more complicated main loop, but I don't think that can be significantly improved. Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42925 (cherry picked from commit fb197a4f7751bb4e116989e57ba7fb12a981895f) --- lib/libc/amd64/string/Makefile.inc | 1 + lib/libc/amd64/string/memrchr.S | 166 +++++++++++++++++++++++++++++++++++++ 2 files changed, 167 insertions(+) diff --git a/lib/libc/amd64/string/Makefile.inc b/lib/libc/amd64/string/Makefile.inc index a14e8a768f01..b1369841bc74 100644 --- a/lib/libc/amd64/string/Makefile.inc +++ b/lib/libc/amd64/string/Makefile.inc @@ -6,6 +6,7 @@ MDSRCS+= \ memccpy.S \ memcpy.S \ memmove.S \ + memrchr.S \ memset.S \ stpcpy.S \ stpncpy.S \ diff --git a/lib/libc/amd64/string/memrchr.S b/lib/libc/amd64/string/memrchr.S new file mode 100644 index 000000000000..4f6c5a238daa --- /dev/null +++ b/lib/libc/amd64/string/memrchr.S @@ -0,0 +1,166 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause + * + * Copyright (c) 2023 Robert Clausecker + */ + +#include + +#include "amd64_archlevel.h" + +#define ALIGN_TEXT .p2align 4, 0x90 + +ARCHFUNCS(memrchr) + ARCHFUNC(memrchr, scalar) + ARCHFUNC(memrchr, baseline) +ENDARCHFUNCS(memrchr) + +ARCHENTRY(memrchr, scalar) + xor %eax, %eax # prospective return value + sub $4, %rdx # 4 bytes left to process? + jb 1f + + ALIGN_TEXT +0: xor %r8, %r8 + lea 2(%rdi), %r10 + cmp %sil, 2(%rdi) + cmovne %r8, %r10 # point to null if no match + + cmp %sil, (%rdi) + cmove %rdi, %r8 # point to first char if match + + lea 1(%rdi), %r9 + cmp %sil, 1(%rdi) + cmovne %r8, %r9 # point to first result if no match in second + + lea 3(%rdi), %r11 + cmp %sil, 3(%rdi) + cmovne %r10, %r11 + + test %r11, %r11 + cmovz %r9, %r11 # take first pair match if none in second + + test %r11, %r11 + cmovnz %r11, %rax # take match in current set if any + + add $4, %rdi + sub $4, %rdx + jae 0b + +1: cmp $-3, %edx # a least one character left to process? + jb 2f + + cmp %sil, (%rdi) + cmove %rdi, %rax + + lea 1(%rdi), %rcx + cmp $-2, %edx # at least two characters left to process? + jb 2f + + cmp %sil, 1(%rdi) + cmove %rcx, %rax + + lea 2(%rdi), %rcx + cmp $-1, %edx # at least three character left to process? + jb 2f + + cmp %sil, 2(%rdi) + cmove %rcx, %rax + +2: ret +ARCHEND(memrchr, scalar) + +ARCHENTRY(memrchr, baseline) + movd %esi, %xmm4 + test %rdx, %rdx # empty buffer? + jz .L0 # if yes, return immediately + + punpcklbw %xmm4, %xmm4 # c -> cc + mov %edi, %ecx + punpcklwd %xmm4, %xmm4 # cc -> cccc + and $~0xf, %rdi # align source pointer + pshufd $0, %xmm4, %xmm4 # cccc -> cccccccccccccccc + and $0xf, %ecx + movdqa %xmm4, %xmm0 + mov $-1, %r8d + pcmpeqb (%rdi), %xmm0 # compare aligned head + shl %cl, %r8d # mask of bytes in the head of the buffer + pmovmskb %xmm0, %eax + + sub $16, %rcx + and %r8d, %eax # match mask + add %rcx, %rdx # advance past head + cmc + jbe .Lrunt # did the string end in the buffer? + + mov %rdi, %rsi # pointer to matching chunk + add $16, %rdi + sub $16, %rdx # enough left for another round? + jbe 1f + + /* main loop unrolled twice */ + ALIGN_TEXT +0: movdqa %xmm4, %xmm0 + pcmpeqb (%rdi), %xmm0 + pmovmskb %xmm0, %r8d + + cmp $16, %rdx # enough left for second chunk? + jbe 2f + + movdqa %xmm4, %xmm0 + pcmpeqb 16(%rdi), %xmm0 + pmovmskb %xmm0, %ecx + + lea 16(%rdi), %r9 + test %ecx, %ecx # match found in second chunk? + cmovz %r8d, %ecx # if not, use match data from first chunk + cmovz %rdi, %r9 + + test %ecx, %ecx # any match found? + cmovnz %ecx, %eax # if yes, overwrite previously found match + cmovnz %r9, %rsi + + add $32, %rdi # advance to next iteration + sub $32, %rdx # advance to next chunks + ja 0b + + /* process remaining 1--16 bytes */ +1: pcmpeqb (%rdi), %xmm4 + mov $0xffff, %r8d + xor %ecx, %ecx + sub %edx, %ecx # number of bytes to be masked out + pmovmskb %xmm4, %r9d + shr %cl, %r8d # mask of bytes to be kept in the buffer + and %r9d, %r8d + cmovnz %r8d, %eax + cmovnz %rdi, %rsi + bsr %eax, %eax + lea (%rsi, %rax, 1), %rsi # pointer to match (or junk) + cmovnz %rsi, %rax # if any match was found, return it + ret + + /* end of chunk reached within first half iteration */ +2: test %r8d, %r8d # match in previous chunk? + cmovnz %r8d, %eax # if yes, overwrite previous chunks + cmovnz %rdi, %rsi + add $16, %rdi # point to tail + sub $16, %edx + jmp 1b # handle tail the same otherwise + + /* runt: string ends within head, edx has negated amount of invalid head bytes */ +.Lrunt: mov $0xffff, %r8d + xor %ecx, %ecx + sub %edx, %ecx + shr %cl, %r8d + and %r8d, %eax + bsr %eax, %eax + lea (%rdi, %rax, 1), %rdi + cmovnz %rdi, %rax + ret + + /* empty buffer: return a null pointer */ +.L0: xor %eax, %eax + ret +ARCHEND(memrchr, baseline) + + .section .note.GNU-stack, "", %progbits