Re: find(1): I18N gone wild ?
- In reply to: Xin LI : "Re: find(1): I18N gone wild ?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 17 Apr 2023 21:33:04 UTC
Xin LI wrote: > This is expected behavior (in en_US.UTF-8 the ordering is AaBb, not > ABab). You might want to set LC_COLLATE to C if C behavior is desirable. > > On Mon, Apr 17, 2023 at 2:06 PM Poul-Henning Kamp <phk@phk.freebsd.dk > <mailto:phk@phk.freebsd.dk>> wrote: > > This surprised me: > > # mkdir /tmp/P > # cd /tmp/P > # touch FOO > # touch bar > # env LANG=C.UTF-8 find . -name '[A-Z]*' -print > ./FOO > # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print > ./FOO > ./bar > > Really ?! A bit more detail: find uses fnmatch(3) here, where the RE Bracket Expression rules apply (except for ! instead of ^, but that's unrelated): https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05 ...which has the following note: 7. In the POSIX locale, a range expression represents the set of collating elements that fall between two elements in the collation sequence, inclusive. In other locales, a range expression has unspecified behavior: strictly conforming applications shall not rely on whether the range expression is valid, or on the set of collating elements matched. Indeed, it's unfortunate that collations in non-POSIX are not that... linear and range expressions can break, but I don't see an easy way of "fixing" this.