Why no non-latin TODIGIT mappings in UTF-8.src ?

Mon May 28 08:47:01 UTC 2007

* Andrey Chernov <ache at freebsd.org> [070528 09:28]:
> On Mon, May 28, 2007 at 12:41:42AM +0200, Wolfgang Zenker wrote:

>> I'm a bit surprised there are no TODIGIT mappings for non-latin scripts
>> in src/share/mklocale/UTF-8. Is there a technical reason why this would
>> be a bad idea or is it simply because noone did get around to define the
>> mappings yet?

> Because of POSIX isdigit():

> digit 
> Define the characters to be classified as numeric digits. 
> In the POSIX locale, only:

> 0 1 2 3 4 5 6 7 8 9

> shall be included.

> In a locale definition file, only the digits <zero>, <one>, <two>, 
> <three>, <four>, <five>, <six>, <seven>, <eight>, and <nine> shall be 
> specified, and in contiguous ascending sequence by numerical value. The 
> digits <zero> to <nine> of the portable character set are automatically 
> included in this class.

Looking at our UTF-8.src, I see

$ grep DIGIT UTF-8.src
DIGIT     '0' - '9'
XDIGIT    '0' - '9'  'A' - 'F'  'a' - 'f'
TODIGIT   < '0' - '9' : 0x0000 >
TODIGIT   < 'A' - 'F' : 10 > < 'a' - 'f' : 10 >

It appears to me that isdigit() behaviour is controlled by the DIGIT
keyword, not TODIGIT. However, I do admit that I don't understand completely
how locale files are supposed to work. So where does e.g. iswdigit() get
its character class information from, should that not be in the locale
information as well somewhere?

Wolfgang