From nobody Wed Sep 25 13:30:34 2024 X-Original-To: standards@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XDHfg6vC5z5XTdT for ; Wed, 25 Sep 2024 13:30:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XDHff5zQhz42wC for ; Wed, 25 Sep 2024 13:30:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1727271034; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=wcrsgTsKImEPAqtbFMnC5ZlPWLtTEcF4yEOKrH76uAY=; b=KRtYzz3EI+sH8637s6JQ7Kztfasg0n1wXrhMtQC2Yo2kWsXfYt21r+vUUg1DXSK7YSMayk 9BV/A+1Jtuo2wESMeWhBdbgjESKZx4MC4jALSejyq7EpzvNg3uEqisC+WIKQYt1qOpPAIZ Vze9ns4IV9j616YiUCeY74+vS3JCXvrmZExQT8u/sKESm2vGqtaK56xdFtShc0p20ryh+f +V/4LxZ0PnYgADX3DbMsLZb60ERfDQANu9QmR4YO+2wbiICMiJv9JXVwLRy3BLboDrdYBr +KzrN2qNgdk+52NSp43glAR63fSNrPxz7DF3HTsZMDXUvEFCDAUE+1eUdoWtiA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1727271034; a=rsa-sha256; cv=none; b=LKVe0bIbshZnH6EyQhFFJrOET9IFF4xbwQkwQg7HTYe59meypdbjf1O4iomd+C+7iOaVr5 GEbzPcusio4j+GTOvbXRjJWCE3pBcTz5/PwZ/ro2Vmf1FKEoYuBSTTR22C2nU7BHhP448o 2xY5vxv3aU8Z0mdAYCPqYEeGa3blenMw2yi3EK36+ji9g63rQF9sXHSIGx0lSwm7/Kub5W 8/qtDstM2HFFrrQ14EqSqo7qz+IGaItGMWx45at4joHhReZefXboWYnE3XsA3v+uJIfy56 G5tYSOYw9lTAyLcEpWPRUE+aveUglORF3gcRiYcfUPm2hUYX4x+t63qSp/dLdg== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4XDHff5b4mzF4r for ; Wed, 25 Sep 2024 13:30:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 48PDUYJ4053617 for ; Wed, 25 Sep 2024 13:30:34 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 48PDUY2Y053616 for standards@FreeBSD.org; Wed, 25 Sep 2024 13:30:34 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: standards@FreeBSD.org Subject: [Bug 281710] RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7) Date: Wed, 25 Sep 2024 13:30:34 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: standards X-Bugzilla-Version: 14.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: erichanskrs@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: standards@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Standards compliance List-Archive: https://lists.freebsd.org/archives/freebsd-standards List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-standards@freebsd.org Sender: owner-freebsd-standards@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D281710 Bug ID: 281710 Summary: RegEXP bug in bracket expression [^...] - sed(1), grep(1), re_format(7) Product: Base System Version: 14.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: standards Assignee: standards@FreeBSD.org Reporter: erichanskrs@gmail.com It looks like there's a bug in FreeBSD's sed(1), grep(1), re_format(7), regarding accented characters and their use in a bracket expression [^...] = in regular expressions (modern REs as well as basic REs). -- Short examples Command lines 202, 203 and 207 show unexpected bahaviour. [200] # echo '9a' | /usr/bin/sed -En 's/([^a])(a)/-\1-\2-/p' -9-a- [201] # echo '9a' | /usr/bin/sed -n 's/\([^a]\)\(a\)/-\1-\2-/p' -9-a- [202] # echo '9=C3=A2' | /usr/bin/sed -n 's/\([^=C3=A2]\)\(=C3=A2\)/= -\1-\2-/p' # <-- [203] # echo '9=C3=A2' | /usr/bin/sed -En 's/([^=C3=A2])(=C3=A2)/-\1= -\2-/p' # <-- [204] # echo '9=C3=A2' | /usr/local/bin/gsed -En 's/([^=C3=A2])(=C3=A2)/-\1= -\2-/p' -9-=C3=A2- [205] # echo '=C3=A2=C3=A2' | /usr/bin/sed -En 's/([=C3=A2])(=C3=A2)= /-\1-\2-/p' -=C3=A2-=C3=A2- [206] # echo '=C3=A2=C3=A2' | /usr/local/bin/gsed -En 's/([=C3=A2])(=C3=A2)= /-\1-\2-/p' -=C3=A2-=C3=A2- [207] # echo '9=C3=A2' | /usr/bin/grep -E '[^=C3=A2]=C3=A2' = # <-- [208] # Same results with characters like '=C3=A7' and '=C3=A9'.=20 Reported in forum thread (see link below) Unicode characters. -- Reference FreeBSD forum link: https://forums.freebsd.org/threads/bug-in-regexp-sed-1-grep-1-and-re_format= -7.95088/ re_format(7): " DESCRIPTION [...] A bracket expression is a list of characters enclosed in `[]'. It n= or- mally matches any single character from the list (but see below). = If the list begins with `^', it matches any single character (but see = be- low) not from the rest of the list. " As FreeBSD intends/tries to conform to POSIX, likewise : https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#ta= g_09_03_05 " 3. A non-matching list expression begins with a ('^'), and the matching behavior shall be the logical inverse of the corresponding matching list expression (the same bracket expression but without the leading ). For example, since the RE "[abc]" only matches 'a', 'b', or = 'c', it follows that "[^abc]" is an RE that matches any character except 'a', 'b= ', or 'c'. It is unspecified whether a non-matching list expression matches a multi-character collating element that is not matched by any of the expressions. The shall have this special meaning only when it occurs first in the list, immediately following the . " -- Context of my OS and programs: [100] # uname -a FreeBSD q210 14.1-RELEASE-p5 FreeBSD 14.1-RELEASE-p5 GENERIC amd64 [101] # pkg which /usr/local/bin/ggrep /usr/local/bin/ggrep was installed by package gnugrep-3.11 [102] # pkg which /usr/local/bin/gsed /usr/local/bin/gsed was installed by package gsed-4.9 [103] # locale LANG=3DC.UTF-8 LC_CTYPE=3D"C.UTF-8" LC_COLLATE=3D"C.UTF-8" LC_TIME=3D"C.UTF-8" LC_NUMERIC=3D"C.UTF-8" LC_MONETARY=3D"C.UTF-8" LC_MESSAGES=3D"C.UTF-8" LC_ALL=3D --=20 You are receiving this mail because: You are the assignee for the bug.=