Question about ASCII and nl_langinfo (locale work)
John Marino
freebsd.contact at marino.st
Sat Nov 14 12:19:19 UTC 2015
On 11/11/2015 5:59 PM, Andrey Chernov wrote:
> On 11.11.2015 1:26, Baptiste Daroussin wrote:
>> The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does
>> not. which means tcl is not able to determine what encoding is needed for the C
>> and POSIX locales.
>>
>> On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most
>> application knows what linux returns.
>>
>> That means we need to teach all upstream about US-ASCII all the time.
>>
>> The proposals are:
>> - Do not change what we have always done.
>> - Change it to something that makes sense "C" (what we tried with "POSIX" which
>> was a very bad idea, but "C" seems to be commonly recognised by application as
>> ASCII)
>> - Let's report the same as Linux, that will simplify portability
>> - Let's be obvious and report ASCII (also commonly recognised by applications)
>
> Just repeating my opinion in this new thread.
>
> Since POSIX don't tell anything certain, we should be Linux compatible
> here to have less surprise, i.e.:
> 1) Return "ANSI_X3.4-1968" for C/POSIX locale (was "US-ASCII").
> 2) Return "ASCII" for *.US-ASCII locales (was "US-ASCII").
> Typical Linux program knows nothing about our "US-ASCII", and porting
> handles it rarely.
>
> Not doing that leads to hidden, hard to find bugs like still present
> right now in our tcl ports. For all that years tcl don't understand
> FreeBSD-native nl_langinfo() "US-ASCII" and falls back to "iso8859-1"
> (it understands Linux "ANSI_X3.4-1968" and "ASCII" of course).
>
As a DragonFly representative (and probably the person that would
implement it), I can accept Andrey's proposal.
What it would mean:
1) "ANSI_X3.4-1968" would be the one return value of
nl_langinfo(CODESET) that is not in the output of "locale -m"
2) This would require an alteration to usr.bin/locale to add this
"ANSI_X3.4-1968" if not found (similar to how it's done for US-ASCII
3) At the same time usr.bin/locale would be modified to change check
from "US-ASCII" to "ASCII"
4) The locale tools would have to be modified to change all source and
map references from "US-ASCII" to "ASCII" and the six LC* generating
makefiles regenerated
5) nl_langinfo would be changed to return "ANSI_X3.4-1968" instead of
"US-ASCII" if the encoding equals "NONE"
6) the "make upgrade" utility would need to remove *.US-ASCII locales
7) Do we really need 6 ".ASCII" locales? It has very limited use, I'd
suggest just having "en_US.ASCII" and that it. Dump en_AU, en_ZA,
en_GB, etc. We can keep all 6 if we want, but if we are removing
US-ASCII anyway, we should limit the locales to what makes sense.
Alternatively FreeBSD could link US-ASCII => ASCII and have both
variations but I think DragonFly will just drop US-ASCII in this case.
What nl_langinfo(CODESET) returns has to be reflected in the locale name
(with the exception of "ANSI_X3.4-1968") so there has to be e.g.
en_US.ASCII as a valid locale if US-ASCII is changed.
There might be other changes necessary if "US-ASCII" is changed; I'd
have to do a thorough review.
To get started, I think this needs to be decided:
A) confirm we want locale -m and nl_langinfo(CODESET) to return
"ANSI_X3.4-1968" for C/POSIX locales
B) Confirm renaming US-ASCII locales to ASCII
C) (FreeBSD only) Decide if you want to conserve US-ASCII locales with
symlinks. nl_langinfo(CODESET) will return "ASCII" for these symlinked
locales
D) Decide the set of "ASCII" locales are really needed. (I suggest one,
en_US.ASCII)
Thanks,
John
More information about the freebsd-arch
mailing list