Inconsistency in LC_CTYPE source files

Li-Lun Wang (Leland Wang) llwang at infor.org
Sat May 20 10:25:20 PDT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

It came to my attention that some LC_CTYPE source files for UTF-8,
UTF-8.src and zh_TW.UTF-8.src, are inconsistent with all other
LC_CTYPE source files. The literals in all other LC_CTYPE source
files, including am_ET.UTF-8.src, are written in the native byte
sequence of that specific locale, whereas UTF-8.src and
zh_TW.UTF-8.src are written in Unicode (It must be noted that UTF-8
is NOT the same as Unicode.). This creates headaches for locale-aware
applications supporting UTF-8. For example, the usages and behaviors
of the is*() and isw*() functions, like iswspace(), are different
under all other locales including am_ET.UTF-8 and under other UTF-8
locales. Under all other locales including am_ET.UTF-8, the argument
for the isw*() functions is the wide character literal in that locale,
whereas under other UTF-8 locales the application must first convert
the wide character from UTF-8 to Unicode before feeding into the
isw*() functions. Is there any good reason to have such inconsistency?
Shall we change UTF-8.src and zh_TW.UTF-8.src so that the behaviors
are consistent with other locales?

Sincerely,
Li-Lun Wang
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEb1D7CQM7t5B2mhARAgMEAJ9FMpNx1IaUGIn0NNBaaHLj3DFQqACbBSJg
tWnXCT2N15U+SntjmuTrGjI=
=JNXG
-----END PGP SIGNATURE-----


More information about the freebsd-hackers mailing list