bug with special bracket expressions in regular expressions
Kimmo Paasiala
kpaasial at gmail.com
Mon Sep 2 17:52:20 UTC 2013
On Mon, Sep 2, 2013 at 7:45 PM, Andriy Gapon <avg at freebsd.org> wrote:
> on 02/09/2013 17:54 Andriy Gapon said the following:
>>
>> re_format(7) says:
>> There are two special cases‡ of bracket expressions: the bracket expres‐
>> sions ‘[[:<:]]’ and ‘[[:>:]]’ match the null string at the beginning and
>> end of a word respectively. A word is defined as a sequence of word
>> characters which is neither preceded nor followed by word characters. A
>> word character is an alnum character (as defined by ctype(3)) or an
>> underscore. This is an extension, compatible with but not specified by
>> IEEE Std 1003.2 (“POSIX.2”), and should be used with caution in software
>> intended to be portable to other systems.
>>
>> However I observe the following:
>> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g'
>> xx
>> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g'
>> cd1 xx
>>
>> In my opinion '[[:<:]]' should not affect how the pattern is matched in this case.
>
> It seems that the code works like this:
> - first it matches "cd0 " and "removes" it
> - then it passes "cd1 xx" for matching with a flag that tells that this is not
> a real start of the string
> - thus the matching code
> o knows that this is not a real line start, so it can't match [[:<:]]
> just for that reason
> o it does _not_ know what was the character before the start of the given
> substring, so it can not know if it could match [[:<:]]
>
> So matching fails.
> Not sure if this is an internal problem of regex(3) or a problem of how sed(1)
> uses regex(3).
>
> --
> Andriy Gapon
In my opinion this is a bug. The [[:<:]] operator is said to match the
empty string at the beginning of a word with no mention that the word
has to be at the beginning of the whole string that is matched. OS X
version of sed(1) works differently:
$ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g'
xx
$ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g'
xx
-Kimmo
More information about the freebsd-current
mailing list