tidy flag
Hiroki Sato
hrs at FreeBSD.org
Thu Feb 5 16:50:15 UTC 2004
Alexey Zelkin <phantom at FreeBSD.org.ua> wrote
in <20040205063847.GA13136 at phantom.cris.net>:
phantom> On Wed, Feb 04, 2004 at 11:27:03PM +0100, Alex Dupre wrote:
phantom> > Ok, the question then becomes: is it possible to replace the -preserve
phantom> > tidy-stable flag with the -numeric tidy-devel flag? Otherwise can you
phantom> > send me a pratical example where -preserve is needed? We (Thierry Thomas
phantom> > and me) will try ourself.
phantom>
phantom> Well. Try below html code with -preserve and without. You'll see a
phantom> difference. Actually most annoying things was a 'entity expansion', but
phantom> there were also some problems with non-ASCII symbols processing under
phantom> some conditions (but unfortunatelly i don't remember details).
phantom>
phantom> <html>
phantom> <body>
phantom> NBSP -
phantom> COPY - ©
phantom> </body>
phantom> </html>
The problem is that the result of the expansion should depend
on the html doc's charset/encoding. For example, in euc-jp, ©
should be {0x8f, 0xa2, 0xed}, but tidy always think it as 0xa9.
And many browsers interpret © as a raw character in the html
doc's charset (euc-jp, in this case). , ©, ·, and
other >159 characters in euc-jp are different from iso-8859-*.
While according to the XML specification it is unambiguous (&#xxx;
is always interpreted as a Unicode character), I think it is better
that entity is preserved as it is at the present moment. Tidy does
not know the relationship between euc-jp and Unicode, so a lot of
Japanese docs will be broken without -preserve.
--
| Hiroki SATO
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-doc/attachments/20040206/398d458e/attachment.sig>
More information about the freebsd-doc
mailing list