Tidy and HTML tab spacing
Warren Block
wblock at wonkity.com
Wed Jan 18 22:49:50 UTC 2012
HTML versions of FreeBSD documents are fed through tidy (www/tidy or
www/tidy-devel) for cleanup. There's a bug in tidy[1] that can cause tab
stops to be wrong:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623
Note how DISTNAME and EXTRACT_SUFX do not line up. They are correct in the
source book.sgml.
So what to do?
1. It might be possible to fix tidy. This would be the neatest. (See
[1]).
2. An option could be added to tidy to ignore tabs. The HTML standard
"strongly discourages" tabs in PRE elements[2], but does not disallow
them. Using actual tabs has an added benefit to the user in that
they could cut-and-paste or just drag-select Makefile examples to see
embedded tabs.
3. Tidy could be replaced with some other tool. However, the others
I've found have additional dependencies on either PHP or Java, so I
did not test them for correct handling of tabs[3],[4]. Either one
adds some overhead not just for doc build machines but anyone who
wants to work on FreeBSD documentation.
4. Add newlines to the HTML in the build process before it gets to
tidy:
s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/
5. Don't tidy HTML files at all (suggested as an option by Benedict
Reuschling). The unprocessed HTML is ugly, but few people are going
to look at it directly. Files that haven't been through tidy are a
little larger, about 4% in the case of the Porter's Handbook.
Footnotes:
[1] In www/tidy-devel, line 355 of streamio.c does not realize that
characters at the beginning of the line may be inside a tag and should
not count as visible. The pre-tidy HTML output of the example above is
----
<PRE
CLASS="PROGRAMLISTING"
>DISTNAME= foo
EXTRACT_SUFX= .tgz</PRE
>
----
The '>' before DISTNAME is being wrongly counted toward the tab stop.
See http://www.wonkity.com/~wblock/tidy/ for a slightly more detailed
example. Tidy is mature software, and there's been a bug report for this
problem in the bug database since 2008:
https://sourceforge.net/tracker/?func=detail&aid=1885471&group_id=27659&atid=390963
So bug fixes in this area from the tidy project are unlikely.
[2] http://www.w3.org/TR/html401/struct/text.html#edef-PRE
[3] http://htmlpurifier.org/
[4] http://htmlcleaner.sourceforge.net/index.php
More information about the freebsd-doc
mailing list