Question about ASCII and nl_langinfo (locale work)

Sat Nov 14 12:19:19 UTC 2015

On 11/11/2015 5:59 PM, Andrey Chernov wrote:
> On 11.11.2015 1:26, Baptiste Daroussin wrote:
>> The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does
>> not. which means tcl is not able to determine what encoding is needed for the C
>> and POSIX locales.
>>
>> On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most
>> application knows what linux returns.
>>
>> That means we need to teach all upstream about US-ASCII all the time.
>>
>> The proposals are:
>> - Do not change what we have always done.
>> - Change it to something that makes sense "C" (what we tried with "POSIX" which
>>   was a very bad idea, but "C" seems to be commonly recognised by application as
>>   ASCII)
>> - Let's report the same as Linux, that will simplify portability
>> - Let's be obvious and report ASCII (also commonly recognised by applications)
> 
> Just repeating my opinion in this new thread.
> 
> Since POSIX don't tell anything certain, we should be Linux compatible
> here to have less surprise, i.e.:
> 1) Return "ANSI_X3.4-1968" for C/POSIX locale (was "US-ASCII").
> 2) Return "ASCII" for *.US-ASCII locales (was "US-ASCII").
> Typical Linux program knows nothing about our "US-ASCII", and porting
> handles it rarely.
> 
> Not doing that leads to hidden, hard to find bugs like still present
> right now in our tcl ports. For all that years tcl don't understand
> FreeBSD-native nl_langinfo() "US-ASCII" and falls back to "iso8859-1"
> (it understands Linux "ANSI_X3.4-1968" and "ASCII" of course).
> 

As a DragonFly representative (and probably the person that would
implement it), I can accept Andrey's proposal.

What it would mean:
1) "ANSI_X3.4-1968" would be the one return value of
nl_langinfo(CODESET) that is not in the output of "locale -m"

2) This would require an alteration to usr.bin/locale to add this
"ANSI_X3.4-1968" if not found (similar to how it's done for US-ASCII

3) At the same time usr.bin/locale would be modified to change check
from "US-ASCII" to "ASCII"

4) The locale tools would have to be modified to change all source and
map references from "US-ASCII" to "ASCII" and the six LC* generating
makefiles regenerated

5) nl_langinfo would be changed to return "ANSI_X3.4-1968" instead of
"US-ASCII" if the encoding equals "NONE"

6) the "make upgrade" utility would need to remove *.US-ASCII locales

7) Do we really need 6 ".ASCII" locales?  It has very limited use, I'd
suggest just having "en_US.ASCII" and that it.  Dump en_AU, en_ZA,
en_GB, etc.  We can keep all 6 if we want, but if we are removing
US-ASCII anyway, we should limit the locales to what makes sense.
Alternatively FreeBSD could link US-ASCII => ASCII and have both
variations but I think DragonFly will just drop US-ASCII in this case.

What nl_langinfo(CODESET) returns has to be reflected in the locale name
(with the exception of "ANSI_X3.4-1968") so there has to be e.g.
en_US.ASCII as a valid locale if US-ASCII is changed.

There might be other changes necessary if "US-ASCII" is changed; I'd
have to do a thorough review.

To get started, I think this needs to be decided:
A) confirm we want locale -m and nl_langinfo(CODESET) to return
"ANSI_X3.4-1968" for C/POSIX locales
B) Confirm renaming US-ASCII locales to ASCII
C) (FreeBSD only) Decide if you want to conserve US-ASCII locales with
symlinks.  nl_langinfo(CODESET) will return "ASCII" for these symlinked
locales
D) Decide the set of "ASCII" locales are really needed.  (I suggest one,
en_US.ASCII)

Thanks,
John