Validating docbook articles...

Chuck Swiger cswiger at mac.com
Mon Feb 23 22:07:28 UTC 2004


Dag-Erling Smørgrav wrote:
> Chuck Swiger <cswiger at mac.com> writes: 
>>How does one generate proper SystemLiterals per:
>>[...]
>>Are these entities published via a URI, or does one need to refer to a
>>local path?
> 
> The system literal can be anything as long as you have a catalog that
> reveals the real location of the external entity.  The usual practice
> for entities that rarely change is to create an online repository and
> let the system literal point to that.  In this case though you might
> as well use an empty or intentionally meaningless string.

Hmm.  Thanks for the response, which is helpful but seems incomplete from the 
standpoint of compatibility with the existing SGML build using nsgmls.

Specificly, I can add a "" pair as the system literal, but xmllint complains 
about an invalid URI, and nsgmls isn't any happier.  Using file URIs works for 
xmllint, but not for nsgmls; using raw pathnames almost works for both, ie 
something like:

<?xml version='1.0' encoding='UTF-8'?>

<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN"
"/usr/local/share/sgml/docbook/4.1/docbook.dtd" [
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN" 
"/usr/doc/share/sgml/man-refs.ent">
%man;
<!ENTITY % freebsd PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD 
Entities//EN" "/usr/doc/share/sgml/freebsd.ent">
%freebsd;
<!ENTITY % trademarks PUBLIC "-//FreeBSD//ENTITIES DocBook Trademark 
Entities//EN" "/usr/doc/share/sgml/trademarks.ent">
%trademarks;
]>

...(rather than "file:///usr/doc...") results in:

170-sec% make lint
/usr/local/bin/nsgmls -wempty -wunclosed -s -D 
/usr/obj/usr/doc/en_US.ISO8859-1/articles/fb -c 
/usr/doc/en_US.ISO8859-1/share/sgml/catalog -c /usr/doc/share/sgml/catalog -c 
/usr/local/share/sgml/iso8879/catalog -c /usr/local/share/sgml/jade/catalog -c 
/usr/local/share/sgml/catalog.ports 
/usr/doc/en_US.ISO8859-1/articles/fb/article.sgml
/usr/local/bin/nsgmls:/usr/doc/en_US.ISO8859-1/articles/fb/article.sgml:173:17:E
: element "DEVICENAME" undefined
/usr/local/bin/nsgmls:/usr/doc/en_US.ISO8859-1/articles/fb/article.sgml:175:27:E
: element "DEVICENAME" undefined
[ ... ]

...whereas not using a SystemLiteral with the DOCTYPE declaration works fine 
with nsgmls but xmllint refuses to parse the document.  Am I wrong in 
concluding that by requiring a SystemLiteral for a document that is valid 
SGML, XML fails design goal #3, aka "XML shall be compatible with SGML"...?

Anyway, using explicit SLs with xmllint gives me:

180-sec% xmllint article.sgml
/usr/doc/share/sgml/freebsd.ent:26: parser error : Entity value required
<!ENTITY rel.current CDATA "5.2">
                      ^
/usr/doc/share/sgml/freebsd.ent:26: parser error : Space required before 'NDATA'
<!ENTITY rel.current CDATA "5.2">
                      ^
/usr/doc/share/sgml/freebsd.ent:26: parser error : xmlParseEntityDecl: entity 
rel.current not terminated
                       ^
[ ... ]

I can edit freebsd.ent to use the "<![CDATA[ ... ]]> syntax, or else remove 
the CDATA declaration entirely, which gives me:

Entity: line 5: parser error : Entity 'trade' not defined
   designations have been followed by the <quote>™</quote> or the
                                                        ^
Entity: line 6: parser error : Entity 'reg' not defined
   <quote>®</quote> symbol.</para>
               ^
Entity: line 6: parser error : chunk is not well balanced
   <quote>®</quote> symbol.</para>
                                      ^
article.sgml:33: parser error : chunk is not well balanced
       &tm-attrib.general;
                          ^
article.sgml:210: parser error : Entity 'prompt.root' not defined
     <screen>&prompt.root; <userinput>sysctl net.link.ether.bridge.config=fxp0:0,
                          ^
article.sgml:211: parser error : Entity 'prompt.root' not defined
&prompt.root; <userinput>sysctl net.link.ether.bridge.ipfw=1</userinput>
              ^
article.sgml:212: parser error : Entity 'prompt.root' not defined
&prompt.root; <userinput>sysctl net.link.ether.bridge.enable=1</userinput></scre
              ^
article.sgml:219: parser error : Entity 'nbsp' not defined
       <para>If you have &os; 5.1-RELEASE or previous the sysctl variables
                                  ^

> You'll want to generate a catalog that looks like this:
[ ...thanks for the example, which I will investigate further... ]

This has been interesting, but it's demonstrably non-trivial to convert SGML 
docbook articles into XML.  More specificly, I don't see how to do so for a 
particular article without making non-local changes to .ent files being 
referenced by the article in order to make the XML version work at all, and I 
don't see how to make both nsgmls and xmllint happy at the same time.

Are these conclusions valid, or I am wrong?  :-)

-- 
-Chuck

PS: The problem I want to solve is simply that I want the DocBook system to 
output valid XHTML according to the W3C validator tool.  I'm willing to accept 
that using xmllint on an XML source document to get XHTML content is probably 
more straightforward than using nsgmls+tidy on an SGML source document, but 
that's not very useful if the conversion to XML breaks existing SGML documents 
until they also are converted to XML...





More information about the freebsd-doc mailing list