[Bug 257972] collating sequence not sensible in some locales
- In reply to: bugzilla-noreply_a_freebsd.org: "[Bug 257972] collating sequence not sensible in some locales"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 20 Aug 2021 15:07:48 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257972 Stefan Eßer <se@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |se@FreeBSD.org --- Comment #2 from Stefan Eßer <se@FreeBSD.org> --- While it is true that POSIX does not define it for ISO8859-1 or UTF-8, it always used to work for ISO8859-1 (as a simple extension of ASCII). The really surprising result is that ISO5589-1 obviously includes lower case letters in the range [A-Z] (it never did before!), while UTF-8 excludes them (and the common practice in Unicode is to have a collating sequence of "aAbBcC..." for latin based character sets. There is obviously code that applies some collating sequence rules, but opposite to what I'd expect. The Linux example shows that they decided to use the traditional collating sequence any locale including ISO8859-1 and UTF-8 (and as said, POSIX does not care at all). We could make ISO8859-1 use the traditional collating sequence and UTF-8 the Unicode convention of lower case just before upper case letter, or we could always apply the traditional collating sequence, but we should definitely use traditional for UTF-8 and Unicode style for ISO8859-1. -- You are receiving this mail because: You are the assignee for the bug.