PDF to HTML translations
Chip Camden
sterling at camdensoftware.com
Mon Sep 6 19:04:42 UTC 2010
Quoth Chad Perrin on Monday, 06 September 2010:
> On Sun, Sep 05, 2010 at 10:31:54AM +0200, Erik Trulsson wrote:
> > On Sun, Sep 05, 2010 at 08:57:11AM +0200, Roland Smith wrote:
> > > On Sat, Sep 04, 2010 at 05:09:20PM -0600, Chad Perrin wrote:
> > > > What PDF to HTML translators, other than pdftohtml, am I likely to be
> > > > able to find in ports? I went looking for pdf2html, expecting to find
> > > > that there, but no luck. Before I spend hours sifting through, still
> > > > without knowing whether I missed something that should be obvious,
> > >
> > > Yes, you did. :-)
>
> Apparently not. See below.
>
>
> > >
> > > > I
> > > > figured I'd ask here whether anyone knows of something off the top of
> > > > his/her head.
> > >
> > > Try textproc/pdftohtml
> >
> > Uhm, he said "other than pdftohtml" so I suspect he already knew about
> > that one.
>
> This is indeed the case.
>
> I appreciate the several suggestions I've received, though I see in
> retrospect that I haven't been sufficiently specific, since I have not
> gotten any suitable answers.
>
> I have "inherited" a Perl script that wraps pdftohtml. The reason a
> wrapper is needed is that a substantial amount of cleanup work is needed
> to produce HTML suitable to our final needs. The output of pdftohtml is
> sufficiently far from "perfect" that I would like to test the output of a
> few other possible "back ends" for the script to see if a significant
> amount of work being done by the script can be eliminated.
>
> Toward that end, the simpler the tool the better -- and the tool on the
> "back end" should not be something that must be contacted across a
> network, or that cannot be redistributed freely. I wanted to start with
> things I have in the base system on my FreeBSD laptop (where I'm doing my
> development) or through ports. OpenOffice.org is quite a bit larger and
> more unwieldy than we would really want to deal with at this point.
> Using Google or Adobe tools online is well outside the range of what we
> need (requiring network access for the tool to work).
>
> I've started looking at the Xpdf tools as well as pdftohtml. Other
> suggestions from within ports would be appreciated. Additional options
> other than what can be found in ports might also be useful, understanding
> the needs I sketched out above. The script itself is Perl, in case that
> matters.
>
> To everyone who has replied so far: thank you for your time.
>
> --
> Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]
How about print/p5-PDFLib and print/pecl-pdflib to roll your own? Maybe
that's more work than you wanted.
--
Sterling (Chip) Camden | sterling at camdensoftware.com | 2048D/3A978E4F
http://camdensoftware.com | http://chipstips.com | http://chipsquips.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20100906/0bc55617/attachment.pgp
More information about the freebsd-questions
mailing list