git: 0aa8b18bc9bb - main - libc: Fix regexec when sizeof(char *) > sizeof(long)

From: Jessica Clarke <jrtc27_at_FreeBSD.org>
Date: Thu, 23 Dec 2021 16:45:14 UTC
The branch main has been updated by jrtc27:

URL: https://cgit.FreeBSD.org/src/commit/?id=0aa8b18bc9bb1d948d4152c50819d69940d68045

commit 0aa8b18bc9bb1d948d4152c50819d69940d68045
Author:     Jessica Clarke <jrtc27@FreeBSD.org>
AuthorDate: 2021-12-23 16:38:10 +0000
Commit:     Jessica Clarke <jrtc27@FreeBSD.org>
CommitDate: 2021-12-23 16:38:10 +0000

    libc: Fix regexec when sizeof(char *) > sizeof(long)
    
    The states macro is the type for engine.c to use, with states1 being a
    local macro for regexec to use to determine whether it can use the small
    matcher or not (by comparing nstates and 8*sizeof(states1)). However,
    macro bodies are expanded in the context of their use, and so when
    regexec uses states1 it uses the current value of states, which is left
    over as char * from the large version (or, really, the multi-byte one,
    but that reuses large's states). For all supported architectures in
    FreeBSD, the two have the same size, and so this confusion is harmless.
    However, for architectures like CHERI where that is not the case (or
    Windows's LLP64 as discovered by LLVM and fixed in 2010 in 2e071faed8e2)
    and sizeof(char *) is bigger than sizeof(long) regexec will erroneously
    try to use the small matcher when nstates is between sizeof(long) and
    sizeof(char *) (i.e. between 64 and 128 on CHERI, or 32 and 64 on LLP64)
    and end up overflowing the number of bits in the underlying long if it
    ever uses those high states. On weirder architectures where sizeof(long)
    is greater than sizeof(char *) this also fixes it to not fall back on
    the large matcher prematurely, but such architectures are likely limited
    to the embedded space, if they exist at all.
    
    Fix this by swapping round states and states1, so that states1 is
    defined directly as being long and states is an alias for it for the
    small matcher case.
    
    Found by:       CHERI
---
 lib/libc/regex/regexec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/libc/regex/regexec.c b/lib/libc/regex/regexec.c
index bf27d05f86c6..d7aa46f45b2b 100644
--- a/lib/libc/regex/regexec.c
+++ b/lib/libc/regex/regexec.c
@@ -97,8 +97,8 @@ xmbrtowc_dummy(wint_t *wi,
 }
 
 /* macros for manipulating states, small version */
-#define	states	long
-#define	states1	states		/* for later use in regexec() decision */
+#define	states1	long		/* for later use in regexec() decision */
+#define	states	states1
 #define	CLEAR(v)	((v) = 0)
 #define	SET0(v, n)	((v) &= ~((unsigned long)1 << (n)))
 #define	SET1(v, n)	((v) |= (unsigned long)1 << (n))