RFC: doc/www cleanup
Gabor Kovesdan
gabor at FreeBSD.org
Fri Aug 3 11:13:05 UTC 2012
Hi Doc Fellows,
the XML migration that is in progress now, is also a big cleanup that
will probably simplify documentation authoring. When working on this
item I've encountered several old constructs and several things that
made me think of further directions. I'd like to discuss these changes
with you before proceeding with them:
1, Removing emacs PSGML comments: PSGML is an emacs mode for SGML
editing. It can be instructed to behave in a determined way by SGML
comments or separately with a configuration file (described in
fdp-primer). Our documentation is scattered by PSGML comments like this:
<!--
Local Variables:
mode: sgml
sgml-indent-data: t
sgml-omittag: nil
sgml-always-quote-attributes: t
End:
-->
XML requires tags to be closed and attributes to be always quoted so
this loses most if its utility and these comments just confuse people,
who don't know what they mean. Indenting or any other specific option
can be configured in the .emacs file. I propose dropping these comments.
2, Relaxing character entity usage: To be able to read non-ASCII
characters on ASCII-only systems, we have been using character entities,
like á. But in CJK languages, Greek and Russian every character
is non-ASCII so practically they cannot be used nor were they used. So
they are only used in ISO-8859 encodings (except Greek, which is also
from this family). In fact, displaying these Latin-based characters
nowadays isn't that problematic any more. Furthermore, if you edit text
in a given language then we can suppose that you understand the language
so you know what you should see and you know how to configure your
system if you don't see the desired result. As a result, these entities
nowadays don't have any real advantage any more but they highly
"pollute" the text and make it much harder to edit and read. One
exception is using characters in a specific language that aren't present
there, e.g. a non-English developer name in the English documentation,
etc. So I propose for every translation to convert back entities to
normal characters and only conserve those that aren't present in the
given language. Abundance of character entities used to mean
difficulties for new documentation people, especially for those who
don't have that much IT background. This change would make the texts
more natural.
3, Preferring XML/XSLT over scripts: Some parts of the web, like the A-Z
index and sitemap pages have their own format that is processed with
shell scripts. It would be more consistent to use an XML data file with
an XSLT stylesheet for this objective. It would give us more flexibility
for further changes and would reduce the several different methods we
use to generate things.
4, Stricter XHML: I don't propose going directly to XHTML Strict 1.0 but
there are very inconsistently marked up <hr/>'s, <table>'s, etc. I would
like to make them more consistent and prefer CSS styling when
applicable. There are also empty paragraphs used as line breaks, which
should also be eliminated. This would give us a more consistent look and
more structure-oriented webpage files.
And after the migration, I plan:
5, Identifying obsolete webpages: There are moved pages both in the
English pages and translations that only serve for redirection. These
pages were moved a very long time ago so any interested party could
update her bookmarks. I would like to remove these finally. On the other
hand, there are leftovers in translations, i.e. pages that were removed
from the English web but not from the translations. I would like to
generate a list of them and send patches to translation projects to
clean these up.
Thanks in advance for your comments,
Gabor
More information about the freebsd-doc
mailing list