Printing UTF-8 characters
Farhan Khan
khanzf at gmail.com
Wed Jun 20 01:34:31 UTC 2018
On Thu, Feb 1, 2018 at 10:51 PM, Bakul Shah <bakul at bitblocks.com> wrote:
> On Thu, 01 Feb 2018 10:42:36 -0500 Farhan Khan <khanzf at gmail.com> wrote:
>> Sorry, that was a poorly phrased question on my part. Let me try again.
>> I am trying to make text align in columns in a terminal. My
>> understanding is that characters above 0x7E are 3 bytes in length. A
>> modern terminal will render that as either a single question-mark or
>> the character itself, making terminal column alignment easy. But how
>> would an older terminal display a 3-byte character? I am worried that
>> would render as 3 question marks and throw off column alignment. If
>> so, is there a proper way to perform alignment for both newer and
>> older terminals?
>
> UTF-8 can use upto 4 bytes to encode a unicode point,
> depending on the script.
>
> For what you want, you can use openoffice like programs that
> understand unicode and can do complex text layout. Normal
> terminal programs typically use monospace (fixed width) fonts
> are simply not capable of what you want. The assumption that
> one char means one rectangular cell on the screen is too
> deeply woven in them. Particularly for Indic languages this
> just doesn't work, You may have N unicode points, each of
> which require 3 bytes, all together map to a one single glyph.
Hi all,
To follow-up from my earlier poorly asked question from a few months
back, how do I determine if the terminal is capable of printing UTF-8
encoded strings and/or unicode in general?
The obvious answer is to check the LANG variable via getenv(3), but
what if you are using "en_US.UTF-8" vs "en_GB.UTF-8"? Should I just
check for the string "UTF-8" in the LANG variable?
My concern is printing characters above 0x7F on terminals/encodings
that are not capable of displaying them, resulting in unusual
behavior.
Thanks,
--
Farhan Khan
PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE
More information about the freebsd-hackers
mailing list