patch for /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French
Julian H. Stacey
jhs at berklix.com
Wed Nov 13 18:55:55 UTC 2013
Christian Weisgerber wrote:
> Julian H. Stacey:
>
> > I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
> > national char set stuff as much as possible), but I want
>
> That is your problem right there.
My perspective & experience or `problem' as you mislabel it, is I
was supporting Unix Internationalisation back in 1985, & long since
tired of agravating German umlauts issues (Umlauts even back then
had AE OE UE [& SS] replacements but few used them).
Your problem is being German you had an incentive to attain umlauts,
& probably being younger, wasted less time achieving umlauts going
straight to the since available UTF; but myopic that others may be
averse to waste more time for superflous national oddities that
cleaner Roman derivatives like Italian & English etc find superfluous.
It seemed best to make fmt.c 8 bit clean[er], to help process
arbitrary text, harm no one, & not disturb users of eg UTF.
Your problem is you would obstruct a cleaner fmt, so fmt continues
to fail until users are forced to waste their time too like you did,
reading & configuring internationalisation variables some don't need. **
> > to be able to edit files that simultaneously contain eg all
> > of English German & French etc, so setting some var to eg
> > just German would be inappropriate. 8 bit clean would be ideal,
> > next best would be my patches I suppose.
>
> You MUST define a character set for this. "8-bit clean" is meaningless
> for a tool that deals with runs of characters. Without a defined
> character set, you have no idea what those bytes mean. Is 0x90 a
Not true. See below. **
> printable character? Is it a control character? Is it part of a
> multibyte character?
>
> And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way
> limit you to German. For LC_CTYPE purposes, the language/country
> part of the locale specification isn't used.
>
> This is definitely a PEBKAC.
Avoid junk acronyms.
Re-Read original post
http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html
Particularly:
Example: Pasting notes into an xterm, clauses from
http://seafrance.com in English then French original &
German, to get the feel of what an unclear English translation
**:
Sometimes I mouse paste from Firefox in English, French, German &
other languages, making notes in a single file with vi in an
xterm, all with standard env. no Locale. & it edits OK in vi, &
displays with cat in xterm, till !}fmt in vi wraps long lines,
when fmt breaks it. So I fixed fmt.
It would Not be appropriate to set a German locale, nor a French etc.
Other utils might misbehave now or later See eg man sort re LC_ALL.
No way I'd keep exiting vi & resetting LC_CTYPE between
mouse pastes from different language pages, The default American works fine.
I'm not bothered if vi+xterm might mis-display some odd accent,
as I can see something is there, so long as fmt does not strip the
accent, but FreeBSD fmt.c Does strip the French accents & German
umlauts, that's why I fixed fmt.c
Summary:
Making fmt.c 8 bit cleaner would not break UTF & unicode I believe
so no reason to object to removal of fmt.c '& 0x7f' cruft etc.
Cheers,
Julian
--
Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com
Interleave replies below like a play script. Indent old text with "> ".
Send plain text, not quoted-printable, HTML, base64, or multipart/alternative.
Extradite NSA spy chief Alexander. http://berklix.eu/jhs/blog/2013_10_30
More information about the freebsd-hackers
mailing list