Re: Grep with non-ascii
- Reply: George Mitchell : "Re: Grep with non-ascii"
- In reply to: George Mitchell : "Re: Grep with non-ascii"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 04 Feb 2023 04:16:37 UTC
On Fri, 3 Feb 2023 12:36:47 -0500 George Mitchell <george+freebsd@m5p.com> wrote: > On 2/3/23 11:06, Tomoaki AOKI wrote: > > [...] > > If this is the case like above, the only solution is to move to > > character set containing ALL characters all over the world. > > > > AFAIK, the only candidates are only two, TRON code [1] and Unicode (UCS, > > ISO/IEC 10646) [2]. And TRON code is very rarely used, actual candidate > > would be Unicode only. > > Note that Unicode is usually encoded to any of UTF-8, UTF-16 or UTF-32 > > for data transfer (sometimes raw UCS-2?). > > [...] > > The one positive development in the world of computing that I would > credit to Java is the earliest big push toward the adoption of UTF-8. > I strongly hope UTF-8 becomes universally used sooner rather than > later. -- George And FreeBSD already has UTF-8. ;-) Drawbacks of UTF-8 are... *Han unification. Not exactly same but lookalike characters in Japanese, Chinese and Korean are fatally missingly unified. *Lack of proper support for variant forms of characters. Maybe Unicode should have another 2 dimensions, one for classifying wrongly unified CJK characters and another one for variants. *Font sets. Very limited number of fonts covers the whole Unicode codepoints that are assigned any of actual character. *FreeBSD base does not have full Unicode font for vt yet. (Input methods are the different problem, though.) -- Tomoaki AOKI <junchoon@dec.sakura.ne.jp>