From nobody Mon Aug 21 19:28:45 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RV2b20lhNz4rCvl; Mon, 21 Aug 2023 19:28:46 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RV2b16kxKz4S8d; Mon, 21 Aug 2023 19:28:45 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1692646126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=F/OIf5293LFr1K1FESvqTcKejV0oaQ25L1iQbTLdpjo=; b=RqlThCi8ZjGtvCs8RdTN3M0onyPyffSzgy2fStRZJSEyHG/eWH2Qeui7m3EpcHludiZ77z B1lqG7jJql/2e6a2p3IG3w0lKexna+6Pf2Kj1fpSbqhqT3DxwjpE+m61f7N+Z1Y8vdms0D s/OBZKjAZ6sHXajmo9jhGG1eknNiHozq7LrEq3wqX09XXGCbhGfVzbA9Io9BJe5B1yTIes lq+ZcXwscoYx3MpAwPknviaZAbZYqbCzYZZrN7y0T38hV2Aq305qpsHtScgKnoho1cvv+Y 5pcnGTcVAa3H0J5xJPrXnKKA/7YaO/CM6bbNRVbugeIo1CwfmMhnEf8B9FXrGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1692646126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=F/OIf5293LFr1K1FESvqTcKejV0oaQ25L1iQbTLdpjo=; b=MidOKmQCjB2ARosAZxeEmEhHDrE0FDKhrKgol01eE9Co6/NDRtfH9TVwOe7NpuFcsSzuDr OGcNbnyno5WJQdZACp8r2MB0jbBkkDVzvUn4sTE9YwyPBlFbrLlagKLyPH+bEaAEMU7zpa F1bNF9xDu6gJ4Z2T9En/oimL+TC3Bql1VFh8yrG+BMv+VyQyQ6XQ1g5/UDcVN+DwK8AmMi imR+E+/1etDhya1VOD8UG+8OLCh+xvDVqLERoAqf5zjwfCemgqdpoC9zaJ0F6CP+UPHF+z EjyLtTEGs7YwgjLUjFx+ewtIbzy0Uqlt5G1s0yn7M1JiSxoScfxqQ1zE5jSQUw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1692646126; a=rsa-sha256; cv=none; b=vjQ/hDI1cfZadU0kp/LrxpCruPLOmCW8UgrHOubNPXhb4NiPTpJuvtRvjwENq9ciVzOO4q tAMT0Of9o1fzi2upc8O2zf/q4t3cS95puST6uJexGaFmFsmOAFI9h7gbQLLcghRPNgqG1m vZQ2Mo/7v08eieRUxeSKvPR5UV9OR0WXLsIMPG2ckvfvgzCbOQ7Sou89XNqiVNWXI8Dl7j RT1cMx+DNcHpBeL+SbCnL1bz2Gah8tpvm+rQoFsOOFcCZo/V/Wir+OLibn+w9IpyOhfTiU Q9VkTNTOjPFm2s6L0yge3E+iIRMcWpNuDBtgbLkOT/DpnOcfHmeO/oYAnlDzVQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4RV2b15pPFzYNb; Mon, 21 Aug 2023 19:28:45 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 37LJSjrN031113; Mon, 21 Aug 2023 19:28:45 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 37LJSjW3031110; Mon, 21 Aug 2023 19:28:45 GMT (envelope-from git) Date: Mon, 21 Aug 2023 19:28:45 GMT Message-Id: <202308211928.37LJSjW3031110@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Robert Clausecker Subject: git: 8803f01e9322 - main - lib/libc/amd64/string/memcmp.S: add baseline implementation List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: fuz X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 8803f01e932275cd405690526bb8dba031a02ffe Auto-Submitted: auto-generated The branch main has been updated by fuz: URL: https://cgit.FreeBSD.org/src/commit/?id=8803f01e932275cd405690526bb8dba031a02ffe commit 8803f01e932275cd405690526bb8dba031a02ffe Author: Robert Clausecker AuthorDate: 2023-07-12 13:35:13 +0000 Commit: Robert Clausecker CommitDate: 2023-08-21 19:19:46 +0000 lib/libc/amd64/string/memcmp.S: add baseline implementation This changeset adds a baseline implementation of memcmp and bcmp for amd64. The same code is used for both functions with conditional code were the behaviour differs (we need more precise output for the memcmp case). FreeBSD documents that memcmp returns the difference between the mismatching characters. Slightly faster code would be possible could we relax this requirement to the ISO/IEC 9899:1999 requirement of merely returning a negative/positive integer or zero. Performance is better than bionic and glibc, except for long strings were the two are 13% faster. This could be because they use SSE4 ptest which we cannot use in a baseline kernel. Sponsored by: The FreeBSD Foundation Approved by: mjg Differential Revision: https://reviews.freebsd.org/D41442 --- lib/libc/amd64/string/memcmp.S | 181 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 175 insertions(+), 6 deletions(-) diff --git a/lib/libc/amd64/string/memcmp.S b/lib/libc/amd64/string/memcmp.S index fea5cebc65f2..d192229677b3 100644 --- a/lib/libc/amd64/string/memcmp.S +++ b/lib/libc/amd64/string/memcmp.S @@ -1,9 +1,12 @@ /*- - * Copyright (c) 2018 The FreeBSD Foundation + * Copyright (c) 2018, 2023 The FreeBSD Foundation * * This software was developed by Mateusz Guzik * under sponsorship from the FreeBSD Foundation. * + * Portions of this software were developed by Robert Clausecker + * under sponsorship from the FreeBSD Foundation. + * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: @@ -27,6 +30,10 @@ */ #include +#include + +#include "amd64_archlevel.h" + /* * Note: this routine was written with kernel use in mind (read: no simd), * it is only present in userspace as a temporary measure until something @@ -36,10 +43,15 @@ #define ALIGN_TEXT .p2align 4,0x90 /* 16-byte alignment, nop filled */ #ifdef BCMP -ENTRY(bcmp) -#else -ENTRY(memcmp) +#define memcmp bcmp #endif + +ARCHFUNCS(memcmp) + ARCHFUNC(memcmp, scalar) + ARCHFUNC(memcmp, baseline) +ENDARCHFUNCS(memcmp) + +ARCHENTRY(memcmp, scalar) xorl %eax,%eax 10: cmpq $16,%rdx @@ -157,7 +169,6 @@ ENTRY(memcmp) 1: leal 1(%eax),%eax ret -END(bcmp) #else /* * We need to compute the difference between strings. @@ -230,7 +241,165 @@ END(bcmp) 2: subl %r8d,%eax ret -END(memcmp) #endif +ARCHEND(memcmp, scalar) + +ARCHENTRY(memcmp, baseline) + cmp $32, %rdx # enough to permit use of the long kernel? + ja .Llong + + test %rdx, %rdx # zero bytes buffer? + je .L0 + + /* + * Compare strings of 1--32 bytes. We want to do this by + * loading into two xmm registers and then comparing. To avoid + * crossing into unmapped pages, we either load 32 bytes from + * the start of the buffer or 32 bytes before its end, depending + * on whether there is a page boundary between the overread area + * or not. + */ + + /* check for page boundaries overreads */ + lea 31(%rdi), %eax # end of overread + lea 31(%rsi), %r8d + lea -1(%rdi, %rdx, 1), %ecx # last character in buffer + lea -1(%rsi, %rdx, 1), %r9d + xor %ecx, %eax + xor %r9d, %r8d + test $PAGE_SIZE, %eax # are they on different pages? + jz 0f + + /* fix up rdi */ + movdqu -32(%rdi, %rdx, 1), %xmm0 + movdqu -16(%rdi, %rdx, 1), %xmm1 + lea -8(%rsp), %rdi # end of replacement buffer + sub %rdx, %rdi # start of replacement buffer + movdqa %xmm0, -40(%rsp) # copy to replacement buffer + movdqa %xmm1, -24(%rsp) + +0: test $PAGE_SIZE, %r8d + jz 0f + + /* fix up rsi */ + movdqu -32(%rsi, %rdx, 1), %xmm0 + movdqu -16(%rsi, %rdx, 1), %xmm1 + lea -40(%rsp), %rsi # end of replacement buffer + sub %rdx, %rsi # start of replacement buffer + movdqa %xmm0, -72(%rsp) # copy to replacement buffer + movdqa %xmm1, -56(%rsp) + + /* load data and compare properly */ +0: movdqu 16(%rdi), %xmm1 + movdqu 16(%rsi), %xmm3 + movdqu (%rdi), %xmm0 + movdqu (%rsi), %xmm2 + mov %edx, %ecx + mov $-1, %edx + shl %cl, %rdx # ones where the buffer is not + pcmpeqb %xmm3, %xmm1 + pcmpeqb %xmm2, %xmm0 + pmovmskb %xmm1, %ecx + pmovmskb %xmm0, %eax + shl $16, %ecx + or %ecx, %eax # ones where the buffers match + or %edx, %eax # including where the buffer is not + not %eax # ones where there is a mismatch +#ifndef BCMP + bsf %eax, %edx # location of the first mismatch + cmovz %eax, %edx # including if there is no mismatch + movzbl (%rdi, %rdx, 1), %eax # mismatching bytes + movzbl (%rsi, %rdx, 1), %edx + sub %edx, %eax +#endif + ret + + /* empty input */ +.L0: xor %eax, %eax + ret + + /* compare 33+ bytes */ + ALIGN_TEXT +.Llong: movdqu (%rdi), %xmm0 # load head + movdqu (%rsi), %xmm2 + mov %rdi, %rcx + sub %rdi, %rsi # express rsi as distance from rdi + and $~0xf, %rdi # align rdi to 16 bytes + movdqu 16(%rsi, %rdi, 1), %xmm1 + pcmpeqb 16(%rdi), %xmm1 # compare second half of this iteration + add %rcx, %rdx # pointer to last byte in buffer + pcmpeqb %xmm2, %xmm0 + pmovmskb %xmm0, %eax + xor $0xffff, %eax # any mismatch? + jne .Lmismatch_head + add $64, %rdi # advance to next iteration + jmp 1f # and get going with the loop + + /* process buffer 32 bytes at a time */ + ALIGN_TEXT +0: movdqu -32(%rsi, %rdi, 1), %xmm0 + movdqu -16(%rsi, %rdi, 1), %xmm1 + pcmpeqb -32(%rdi), %xmm0 + pcmpeqb -16(%rdi), %xmm1 + add $32, %rdi # advance to next iteration +1: pand %xmm0, %xmm1 # 0xff where both halves matched + pmovmskb %xmm1, %eax + cmp $0xffff, %eax # all bytes matched? + jne .Lmismatch + cmp %rdx, %rdi # end of buffer reached? + jb 0b + + /* less than 32 bytes left to compare */ + movdqu -16(%rdx), %xmm1 # load 32 byte tail through end pointer + movdqu -16(%rdx, %rsi, 1), %xmm3 + movdqu -32(%rdx), %xmm0 + movdqu -32(%rdx, %rsi, 1), %xmm2 + pcmpeqb %xmm3, %xmm1 + pcmpeqb %xmm2, %xmm0 + pmovmskb %xmm1, %ecx + pmovmskb %xmm0, %eax + shl $16, %ecx + or %ecx, %eax # ones where the buffers match + not %eax # ones where there is a mismatch +#ifndef BCMP + bsf %eax, %ecx # location of the first mismatch + cmovz %eax, %ecx # including if there is no mismatch + add %rcx, %rdx # pointer to potential mismatch + movzbl -32(%rdx), %eax # mismatching bytes + movzbl -32(%rdx, %rsi, 1), %edx + sub %edx, %eax +#endif + ret + +#ifdef BCMP +.Lmismatch: + mov $1, %eax +.Lmismatch_head: + ret +#else /* memcmp */ +.Lmismatch_head: + tzcnt %eax, %eax # location of mismatch + add %rax, %rcx # pointer to mismatch + movzbl (%rcx), %eax # mismatching bytes + movzbl (%rcx, %rsi, 1), %ecx + sub %ecx, %eax + ret + +.Lmismatch: + movdqu -48(%rsi, %rdi, 1), %xmm1 + pcmpeqb -48(%rdi), %xmm1 # reconstruct xmm1 before PAND + pmovmskb %xmm0, %eax # mismatches in first 16 bytes + pmovmskb %xmm1, %edx # mismatches in second 16 bytes + shl $16, %edx + or %edx, %eax # mismatches in both + not %eax # matches in both + tzcnt %eax, %eax # location of mismatch + add %rax, %rdi # pointer to mismatch + movzbl -64(%rdi), %eax # mismatching bytes + movzbl -64(%rdi, %rsi, 1), %ecx + sub %ecx, %eax + ret +#endif +ARCHEND(memcmp, baseline) .section .note.GNU-stack,"",%progbits