Printing UTF-8 characters

Farhan Khan khanzf at gmail.com
Wed Jun 20 04:21:19 UTC 2018


On Tue, Jun 19, 2018, 10:46 PM Conrad Meyer <cem at freebsd.org> wrote:

> You want LC_CTYPE.
>
> On Tue, Jun 19, 2018 at 6:38 PM Farhan Khan <khanzf at gmail.com> wrote:
>
>> On Thu, Feb 1, 2018 at 10:51 PM, Bakul Shah <bakul at bitblocks.com> wrote:
>> > On Thu, 01 Feb 2018 10:42:36 -0500 Farhan Khan <khanzf at gmail.com>
>> wrote:
>> >> Sorry, that was a poorly phrased question on my part. Let me try again.
>> >> I am trying to make text align in columns in a terminal. My
>> >> understanding is that characters above 0x7E are 3 bytes in length. A
>> >> modern terminal will render that as either a single question-mark or
>> >> the character itself, making terminal column alignment easy. But how
>> >> would an older terminal display a 3-byte character? I am worried that
>> >> would render as 3 question marks and throw off column alignment. If
>> >> so, is there a proper way to perform alignment for both newer and
>> >> older terminals?
>> >
>> > UTF-8 can use upto 4 bytes to encode a unicode point,
>> > depending on the script.
>> >
>> > For what you want, you can use openoffice like programs that
>> > understand unicode and can do complex text layout. Normal
>> > terminal programs typically use monospace (fixed width) fonts
>> > are simply not capable of what you want. The assumption that
>> > one char means one rectangular cell on the screen is too
>> > deeply woven in them.  Particularly for Indic languages this
>> > just doesn't work, You may have N unicode points, each of
>> > which require 3 bytes, all together map to a one single glyph.
>>
>> Hi all,
>>
>> To follow-up from my earlier poorly asked question from a few months
>> back, how do I determine if the terminal is capable of printing UTF-8
>> encoded strings and/or unicode in general?
>> The obvious answer is to check the LANG variable via getenv(3), but
>> what if you are using "en_US.UTF-8" vs "en_GB.UTF-8"? Should I just
>> check for the string "UTF-8" in the LANG variable?
>>
>> My concern is printing characters above 0x7F on terminals/encodings
>> that are not capable of displaying them, resulting in unusual
>> behavior.
>>
>> Thanks,
>>
>> --
>> Farhan Khan
>> PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE
>> _______________________________________________
>> freebsd-hackers at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org
>> "
>>
>
Thanks Conrad!

I looked up exactly how locale(1) worked. Similar to what you suggested,
locale(1) did essentially this:

setlocale(LC_ALL, "");
charset = nl_langinfo(CODESET);

The final product was 'charset'.

Thanks!

>


More information about the freebsd-hackers mailing list