How to handle localized characters ans special symbols?
Kövesdán Gábor
gabor.kovesdan at t-hosting.hu
Mon Feb 6 10:49:00 UTC 2006
Simon L. Nielsen wrote:
>On 2006.02.04 20:33:54 +0100, Kövesdán Gábor wrote:
>
>
>
>>I'm translating the FreeBSD webpage to Hungarian. I haven't done too
>>much so far, because I don't have too much spare time, but I'll finish
>>this translation. Today, I made a test build. You can see this here:
>>http://tux.t-hosting.hu/data
>>The most part of it is still in English but there are some translated
>>pages. The build succeeded quite good, I've found my mistakes easily and
>>managed to build the site, but I have troubles with one of the localized
>>characters. This is The o letter with two commas on it. Its standard
>>html code is ő, but the sgml parser substitutes it with a Q char. I
>>don't see why does it happen and don't know how to fix it. There are two
>>more problematic characters, and they are ® and ™. They are
>>also substituted in a wrong way. See:
>>http://tux.t-hosting.hu/data/about.html
>>You can notice the Z character with a ?? sign after the word Pentium and
>>a " after Athlon.
>>How could I correctly display these characters? Please tell me what to
>>do so that we have a nice Hungarian webpage. :)
>>
>>(I use Firefox and it selects the ISO-8859-2 Central European encoding
>>automatically.)
>>
>>
>
>I think the problem is that your web server forces a character set
>which prevents the character set in the HTML from taking effect:
>
>[simon at zaphod:~] fetch -o /dev/null -vv http://tux.t-hosting.hu/data/about.html | & grep Content-Type:
><<< Content-Type: text/html; charset=ISO-8859-2
>
>I'm not exactly sure how some of the other translations are handling
>using non ISO-8859-1, but since e.g. ja and ru translations use
>something which definitely isn't Latin characters I'm sure it can be
>done. See how those translations changes the character set as needed.
>
>
>
I've found out, it's not just about the charset used by the browser. The
SGML parser substitutes ő with Q. If ő remained in the html
files, the browser would display them correctly. I tried to put this to
my Makefile, to override the default in web.bsd.mk, hoping that SGML
parser will not make this unwanted substitution any more:
SGMLNORMOPTS= -d ${SGMLNORMFLAGS} -c ${CATALOG} -D ${.CURDIR} -biso-8859-2
But no use.
I get a new problem recently, too. According to
http://www.w3.org/2003/entities/iso8879doc/isolat1.html the entities
á é etc... are accepted standards in the XML language, but
if I put these character into an .xsl file, e.g. index.xsl the web build
will fail.
Anyway, I've realized if I simply write a character ő into the sgml
sources it remaines good, but I don't know how standard and portable
this solution is. I would like to make my work as standard and portable
as it can be.
As for the Russian website, they just type their characters according to
their charset, and I see strange chaarcters in the sources. It is
definitely working, but isn't there some more elegant solution? Like
á instead of á, é instead of é, etc...
Thanks,
Gabor
More information about the freebsd-doc
mailing list