problem with ls, not show a correct list
Garrett Wollman
wollman at hergotha.csail.mit.edu
Fri Apr 7 05:29:09 UTC 2017
In article <3a8b8ade882d1486aa41b448a9c83b6c at i805.com.br> you write:
>
>
> It's a terrible!!!! Is it a locale bug? Look!
>
>% locale
>LANG=pt_BR.UTF-8
>% touch E
>% ls -l [a-z]*
>-rw-r--r-- 1 rizzo wheel 0 7 abr 02:06 E
No, it's the specification of how character ranges in glob(3) and
fnmatch(3) work. In effect, character ranges like [a-z] must be
treated as ranges of *collating elements*, not byte ranges, and in
your locale, <a> and <A> are considered to be the same collating
element, so [a-z] matches both upper- and lower-case Latin letters.
This is documented, very obliquely, in sh(1), which also tells you the
workaround:
a character class. A character class matches any of the characters
between the square brackets. A locale-dependent range of characters may
be specified using a minus sign. A named class of characters (see
wctype(3)) may be specified by surrounding the name with `[:' and `:]'.
For example, `[[:alpha:]]' is a shell pattern that matches a single let-
ter.
So, to match only lower-case letters regardless of your current locale
setting, you must use the correct character class:
$ locale
LANG=pt_BR.UTF-8
LC_CTYPE="pt_BR.UTF-8"
LC_COLLATE="pt_BR.UTF-8"
LC_TIME="pt_BR.UTF-8"
LC_NUMERIC="pt_BR.UTF-8"
LC_MONETARY="pt_BR.UTF-8"
LC_MESSAGES="pt_BR.UTF-8"
LC_ALL=
$ ls
D E F a b c
$ ls [[:lower:]]*
a b c
The same applies to character class ranges in regular expressions, not
just glob(3) patterns.
-GAWollman
More information about the freebsd-current
mailing list