Performance improvement to strnlen().
Václav Zeman
vhaisman at gmail.com
Sun May 26 06:30:40 UTC 2013
On 05/25/2013 10:27 PM, Lee Thomas wrote:
> Hello FreeBSD devs,
>
> I have found a performance improvement to libc's strnlen().
> lib/libc/string/strnlen.c is a trivial byte-by-byte implementation,
> where strlen.c has a smarter word-by-word implementation. I have
> implemented strnlen similarly to strlen. It runs about 4x as fast, at
> the cost of a binary codesize increase from 30 bytes to 221.
>
> In writing this I needed a test, and there isn't one in
> tools/regression/lib/libc/string, so I wrote a unit test for strnlen.
> This really only makes sense for a word-by-word implementation, as it
> tests each combination of string and limit length, overread characters,
> and starting alignment.
>
> Could someone please review these patches? I am not very experienced in
> C, and am even less experienced with FreeBSD's style guidelines, so they
> likely have a bunch of style and idiom issues. Even if one or both of
> them prove not worth committing, it would still be useful for my learning.
>
> If/When these patches prove worthy of submitting, what is the next step
> after that? Should I submit a PR, or is there some other process? This
> code is quite similar to the existing strlen.c, and doesn't do anything
> fancy with e.g. endianness, but I haven't tested it for correctness or
> performance on anything other than x86...
>
> And finally, there is some other low-hanging fruit in the other strn*
> functions. Would it be worth it for me to give those the same treatment?
>
> Thanks,
> Lee Thomas
>
> Test platform:
> $uname -a
> FreeBSD LeeDesktop 9.1-STABLE FreeBSD 9.1-STABLE #15 r250831:
> Mon May 20 17:31:28 EDT 2013
> lthomas at LeeDesktop:/usr/obj/usr/src/sys/NOSOUND amd64
> $dmesg | grep CPU:
> CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
> (2666.81-MHz K8-class CPU)
> $clang --version
> FreeBSD clang version 3.2 (tags/RELEASE_32/final 170710) 20121221
> Target: x86_64-unknown-freebsd9.1
> Thread model: posix
>
> My timing test was a file of 10000 strings, of uniformly random length,
> 50% between 0 and 20 bytes long, and 50% between 21 and 1000 bytes long.
> Each was filled with random generated bytes in the range [1, 255]. The
> test executables run their strlen on each string in the file 1000 times,
> and a shell script ran the test programs alternately 100 times. Here are
> the clang results; gcc results are roughly the same. I will share the
> actual timing code if someone wants it, but it is pretty ugly C++ and
> shell and I don't guarantee it working :-).
>
> Results:
>
> x ./times_bsd_strnlen.txt
> + ./times_my_strnlen.txt
> +--------------------------------------------------------------------------+
>
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |A A|
> +--------------------------------------------------------------------------+
>
> N Min Max Median Avg Stddev
> x 101 1.8685486 1.8693889 1.8687506 1.8687894 0.0001484903
> + 101 0.42704683 0.42779486 0.42712813 0.42715597 0.00010681494
> Difference at 95.0% confidence
> -1.44163 +/- 3.56739e-05
> -77.1426% +/- 0.00190893%
> (Student's t, pooled s = 0.000129342)
>
> Note: I manually shortened the histogram, as it was 100 lines long of
> exactly the same.
>
>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>
> Index: strnlen.c
> ===================================================================
> diff --git a/head/lib/libc/string/strnlen.c b/head/lib/libc/string/strnlen.c
> --- a/head/lib/libc/string/strnlen.c (revision 250951)
> +++ b/head/lib/libc/string/strnlen.c (working copy)
> @@ -1,5 +1,6 @@
> /*-
> - * Copyright (c) 2009 David Schultz <das at FreeBSD.org>
> + * Copyright (c) 2009, 2010 Xin LI <delphij at FreeBSD.org>
> + * Copyright (c) 2013 Lee Thomas <lee_thomas at AslanTools.com>
> * All rights reserved.
> *
> * Redistribution and use in source and binary forms, with or without
> @@ -27,16 +28,91 @@
> #include <sys/cdefs.h>
> __FBSDID("$FreeBSD$");
>
> +#include <sys/limits.h>
> +#include <sys/types.h>
> #include <string.h>
>
> +/*
> + * Portable strnlen() for 32-bit and 64-bit systems.
> + *
> + * Rationale: it is generally much more efficient to do word length
> + * operations and avoid branches on modern computer systems, as
> + * compared to byte-length operations with a lot of branches.
> + *
> + * The expression:
> + *
> + * ((x - 0x01....01) & ~x & 0x80....80)
> + *
> + * would evaluate to a non-zero value iff any of the bytes in the
> + * original word is zero.
> + *
> + * On multi-issue processors, we can divide the above expression into:
> + * a) (x - 0x01....01)
> + * b) (~x & 0x80....80)
> + * c) a & b
> + *
> + * Where, a) and b) can be partially computed in parallel.
> + *
> + * The algorithm above is found on "Hacker's Delight" by
> + * Henry S. Warren, Jr.
> + */
> +
> +/* Magic numbers for the algorithm */
> +#if LONG_BIT == 32
> +static const unsigned long mask01 = 0x01010101;
> +static const unsigned long mask80 = 0x80808080;
> +#elif LONG_BIT == 64
> +static const unsigned long mask01 = 0x0101010101010101;
> +static const unsigned long mask80 = 0x8080808080808080;
> +#else
> +#error Unsupported word size
> +#endif
> +
> +#define LONGPTR_MASK (sizeof(long) - 1)
> +
> size_t
> -strnlen(const char *s, size_t maxlen)
> +strnlen(const char *str, size_t maxlen)
> {
> - size_t len;
> + const char *stop, *short_stop;
> + const char *p;
> + const unsigned long *lp;
> + long va, vb;
>
> - for (len = 0; len < maxlen; len++, s++) {
> - if (!*s)
> - break;
> + if (maxlen==0) return 0;
> +
> + stop=str+maxlen;
> + /*
> + * Before trying the hard (unaligned byte-by-byte access) way
> + * to figure out whether there is a nul character, try to see
> + * if there is a nul character is within this accessible word
> + * first.
> + *
> + * p and (p & ~LONGPTR_MASK) must be equally accessible since
> + * they always fall in the same memory page, as long as page
> + * boundaries is integral multiple of word size.
> + */
> + lp = (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK);
> + va = (*lp - mask01);
> + vb = ((~*lp) & mask80);
I do not think that this correct C. This is type punning violating the
rules of the language.
One way you can rewrite this is to use memcpy() into a temporary. Do not
be afraid of the memcpy(), modern GCC, and I hope even Clang, can
optimize it into a single move instruction:
{
long tmp;
memcpy(&tmp, lp, sizeof(tmp));
va = (tmp - mask01);
vb = ((~tmp) & mask80);
}
Another way could be using the union trick.
> + lp++;
> + if (va & vb) {
> + /* Check if we have \0 in the first part */
> + short_stop=(const char *)lp;
> + if (stop<short_stop) short_stop=stop;
> + for (p=str; p != short_stop; p++)
> + if (*p == '\0')
> + return (p-str);
> }
> - return (len);
> + /* Scan the rest of the string using word sized operation */
> + for (; (const char *)lp < stop; lp++) {
> + va = (*lp - mask01);
> + vb = ((~*lp) & mask80);
Ditto.
> + if (va & vb) {
> + for (p=(const char *)lp; p != stop; p++)
> + if (*p == '\0')
> + break;
> + return (p-str);
> + }
> + }
> + return (maxlen);
> }
--
VZ
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 293 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20130526/11781461/attachment.sig>
More information about the freebsd-hackers
mailing list