[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)
Date: Fri, 27 Sep 2024 09:17:27 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281710 --- Comment #13 from Eric <erichanskrs@gmail.com> --- (in reply to Kyle Evans comment #10) (in reply to Olivier Certner comment #12) Based on the commit comments https://cgit.freebsd.org/src/commit/?id=8f7ed58a15556bf567ff876e1999e4fe4d684e1d however, I see that I may have underestimated the possible veracious impact on string processing in a pervasive UTF-8 world. I haven't a test setup available at the moment to test the examples below on -CURRENT or -STABLE-13 or 14 -- Examples [1] # cat names cedric étienne égards françois [2] # cat names | grep '[é]' étienne égards [3] # cat names | grep '[éç]' étienne égards françois [4] # cat names | grep '[éi]' # <-- error cedric étienne françois [5] # cat names | grep -i '[éi]' # <-- case-insensitive "avoids" singleton cedric étienne égards françois [6] # cat names | grep -E '[é]|[i]' # <-- splitting in two bracket expressions avoids errroneous code cedric étienne égards françois [7] # I think such cases likely will have been overlooked, misjudged as correctly processed or not investigated further. Fast & correct (UTF-8) string processing is difficult and this made me have another look at singleton's char processing. Viewing from a distance (and assuming one test operation (the first only) in the string of "shortcut" ||-operands), the distance to the prize (i.e. line 1626) in https://github.com/freebsd/freebsd-src/blob/main/lib/libc/regex/regcomp.c#L1626 as compared to https://github.com/freebsd/freebsd-src/blob/releng/14.1/lib/libc/regex/regcomp.c#L1600 has gone up considerably: singleton-error: 2 tests singleton-modified: 6 tests Are the added complexity and extra processing steps of an added singleton function for a bracket expression still justified? Case-insensitive bracket expressions don't profit, as can be painfully observed in the examples above; they just add a certain small amount of additional time. I wonder if comparitive testing with singleton processing versus without it yields justifiable gains—yes, that is a subjective adjective. -- You are receiving this mail because: You are on the CC list for the bug.