git: 521c1fe0e200 - main - libc/aarch64: fix strlen() when flush-to-zero is set

From: Robert Clausecker <fuz_at_FreeBSD.org>
Date: Thu, 16 Jan 2025 01:23:01 UTC
The branch main has been updated by fuz:

URL: https://cgit.FreeBSD.org/src/commit/?id=521c1fe0e2002dfd7d8db86eb7144b7865229912

commit 521c1fe0e2002dfd7d8db86eb7144b7865229912
Author:     Robert Clausecker <fuz@FreeBSD.org>
AuthorDate: 2025-01-13 13:41:41 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2025-01-16 01:20:30 +0000

    libc/aarch64: fix strlen() when flush-to-zero is set
    
    Our SIMD-enhanced strlen() implementation for AArch64 uses
    a floating-point comparison to compare a bit mask to zero.
    This works fine under normal circumstances, but fails if
    the FZ (flush-to-zero) flag is set in FPCR (the floating-point
    control register) as then the CPU no longer distinguishes
    denormals from zero.
    
    This was not caught during testing; this flag is rarely set
    and programs that do so rarely perform string manipulation.
    
    Avoid this problem by using an integer comparison instead.
    The performance impact seems to be small (about 0.5 %) on
    the Windows 2023 Dev Kit, but seems to be more significant
    (up to around 19%) on the RPi 5.
    
    Reviewed by:    getz
    Fixes:          3863fec1ce2dc6033f094a085118605ea89db9e2
    Differential Revision:  https://reviews.freebsd.org/D48442
---
 lib/libc/aarch64/string/strlen.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/libc/aarch64/string/strlen.S b/lib/libc/aarch64/string/strlen.S
index 7bfac7f4b1e1..6fefc252eca1 100644
--- a/lib/libc/aarch64/string/strlen.S
+++ b/lib/libc/aarch64/string/strlen.S
@@ -33,9 +33,8 @@ ENTRY(__strlen)
 	ldr	q0, [x10, #16]!
 	cmeq	v0.16b, v0.16b, #0
 	shrn	v0.8b, v0.8h, #4	// reduce to fit mask in GPR
-	fcmp	d0, #0.0
-	b.eq	.Lloop
 	fmov	x1, d0
+	cbz	x1, .Lloop
 .Ldone:
 	sub	x0, x10, x0
 	rbit	x1, x1			// reverse bits as NEON has no ctz