[PATCH] docproj port needs to use tidy-devel

Gábor Kövesdán gabor at FreeBSD.org
Sat Jan 26 22:44:44 UTC 2008


Murray Stokely escribió:
> Is there any reason not to update the docproj port to use tidy-devel rather
> than tidy?  The released version of tidy is nearly 8 years old and produces
> xhtml that doesn't validate.  The newer -devel releases produce more correct
> xhtml.
>   
First, sorry for the late answer. Not just the xhtml, but the html 
output of tidy is incorrect as well, it does not validate. (I think 
www/63552 is related, because without tidy, such errors don't appear.) 
But, the newer tidy versions completely mess up character sets. They 
mess the Hungarian characters set surely, but I suspect there are 
others, too. The only reason that we don't disable it in the Hungarian 
project is that builder has an ancient version, which works fine. 
Besides, different versions of tidy have different set of command line 
options, which makes our toolchain less portable.
But anyway, why we do really need tidy? I made some tests before without 
tidy and the only thing that I had to do for generating valid pages was 
to reinplace-edit the DTD. As sgmlnorm outputs our custom DTD, the 
webpages were not valid, but after replacing them with HTML 4.1 
Transitional DTD, everything validated. I'd prefer see it go away.
Yes, I know that one reason for tidy is the indenting and line breaking 
in HTML code, the output of sgmlnorm is not for human consumption. But 
cannot we do that in a simpler way?

One more idea, which came to my mind about this. Currently, our webpages 
are not uniform. We use HTML 4.1 for our pages generated from .sgml and 
XHTML 1.1 for .xsl output. What do you think about using XHTML 1.1 
uniformly? Obviously, sgmlnorm cannot do that, but there are advantages 
in using XML-based technologies. Well, I'm just an enthusiastic newbie 
about XML, but I think it would make the data-sharing between our pages 
easier. Plus, we can make our infrastructure more simple as we would 
only need the XML tools for building webpages and one DTD, no more 
conditional cases in .ent files, like this one in header.ent:

<![ %xml.features; [
<!ENTITY header1.meta '
  <meta http-equiv="Content-Type" content="text/html; 
charset=&xml.encoding;" />
  <meta name="MSSmartTagsPreventParsing" content="TRUE" />
'>
]]>
<!ENTITY header1.meta '
  <meta http-equiv="Content-Type" content="text/html; 
charset=&xml.encoding;">
  <meta name="MSSmartTagsPreventParsing" content="TRUE">
'>

Also, XHTML is easier to validate, more strict yet not more difficult to 
edit. It is also supposed to obsolete HTML, (yet with the draft of HTML5 
it is not that sure any more, but this has nothing to do with the topic 
and its advantages) and it is a newer standard to conform to.

As a result, I think it would be a good idea. Maybe it would be a good 
SoC project for me to polish the pages in this way as I'm interested, I 
want to learn more XML stuff and I want to participate in the upcoming 
SoC again. Another item would be to bring the doc repo to DocBook5 / XML.

If this whole stuff about XML had been discussed before, forgive me 
please, I missed that.

Regards,

-- 
Gabor Kovesdan

EMAIL: gabor at FreeBSD.org
WWW:   http://www.kovesdan.org




More information about the freebsd-doc mailing list