[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 25 Sep 2024 13:30:34 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281710 Bug ID: 281710 Summary: RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7) Product: Base System Version: 14.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: standards Assignee: standards@FreeBSD.org Reporter: erichanskrs@gmail.com It looks like there's a bug in FreeBSD's sed(1), grep(1), re_format(7), regarding accented characters and their use in a bracket expression [^...] in regular expressions (modern REs as well as basic REs). -- Short examples Command lines 202, 203 and 207 show unexpected bahaviour. [200] # echo '9a' | /usr/bin/sed -En 's/([^a])(a)/-\1-\2-/p' -9-a- [201] # echo '9a' | /usr/bin/sed -n 's/\([^a]\)\(a\)/-\1-\2-/p' -9-a- [202] # echo '9â' | /usr/bin/sed -n 's/\([^â]\)\(â\)/-\1-\2-/p' # <-- [203] # echo '9â' | /usr/bin/sed -En 's/([^â])(â)/-\1-\2-/p' # <-- [204] # echo '9â' | /usr/local/bin/gsed -En 's/([^â])(â)/-\1-\2-/p' -9-â- [205] # echo 'ââ' | /usr/bin/sed -En 's/([â])(â)/-\1-\2-/p' -â-â- [206] # echo 'ââ' | /usr/local/bin/gsed -En 's/([â])(â)/-\1-\2-/p' -â-â- [207] # echo '9â' | /usr/bin/grep -E '[^â]â' # <-- [208] # Same results with characters like 'ç' and 'é'. Reported in forum thread (see link below) Unicode characters. -- Reference FreeBSD forum link: https://forums.freebsd.org/threads/bug-in-regexp-sed-1-grep-1-and-re_format-7.95088/ re_format(7): " DESCRIPTION [...] A bracket expression is a list of characters enclosed in `[]'. It nor- mally matches any single character from the list (but see below). If the list begins with `^', it matches any single character (but see be- low) not from the rest of the list. " As FreeBSD intends/tries to conform to POSIX, likewise : https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_03_05 " 3. A non-matching list expression begins with a <circumflex> ('^'), and the matching behavior shall be the logical inverse of the corresponding matching list expression (the same bracket expression but without the leading <circumflex>). For example, since the RE "[abc]" only matches 'a', 'b', or 'c', it follows that "[^abc]" is an RE that matches any character except 'a', 'b', or 'c'. It is unspecified whether a non-matching list expression matches a multi-character collating element that is not matched by any of the expressions. The <circumflex> shall have this special meaning only when it occurs first in the list, immediately following the <left-square-bracket>. " -- Context of my OS and programs: [100] # uname -a FreeBSD q210 14.1-RELEASE-p5 FreeBSD 14.1-RELEASE-p5 GENERIC amd64 [101] # pkg which /usr/local/bin/ggrep /usr/local/bin/ggrep was installed by package gnugrep-3.11 [102] # pkg which /usr/local/bin/gsed /usr/local/bin/gsed was installed by package gsed-4.9 [103] # locale LANG=C.UTF-8 LC_CTYPE="C.UTF-8" LC_COLLATE="C.UTF-8" LC_TIME="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_ALL= -- You are receiving this mail because: You are the assignee for the bug.