[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)
Date: Wed, 25 Sep 2024 20:44:12 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281710 --- Comment #9 from commit-hook@FreeBSD.org --- A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d96ce6d000703f3f57d9214b741e16cc7741d77e commit d96ce6d000703f3f57d9214b741e16cc7741d77e Author: Bill Sommerfeld <sommerfeld@hamachi.org> AuthorDate: 2023-12-21 03:46:14 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2024-09-25 20:42:28 +0000 regex: mixed sets are misidentified as singletons Fix "singleton" function used by regcomp() to turn character set matches into exact character matches if a character set has exactly one element. The underlying cset representation is complex; most critically it records"small" characters (codepoint less than either 128 or 256 depending on locale) in a bit vector, and "wide" characters in a secondary array. Unfortunately the "singleton" function uses to identify singleton sets treated a cset as a singleton if either the "small" or the "wide" sets had exactly one element (it would then ignore the other set). The easiest way to demonstrate this bug: $ export LANG=C.UTF-8 $ echo 'a' | grep '[abĂ ]' It should match (and print "a") but instead it doesn't match because the single accented character in the set is misinterpreted as a singleton. PR: 281710 Reviewed by: kevans, yuripv Obtained from: illumos (cherry picked from commit 8f7ed58a15556bf567ff876e1999e4fe4d684e1d) lib/libc/regex/regcomp.c | 25 ++++++++++++++++++----- lib/libc/tests/regex/multibyte.sh | 43 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 62 insertions(+), 6 deletions(-) -- You are receiving this mail because: You are on the CC list for the bug.