any way to turn a pdf file into something OCR-able?
Robert Huff
roberthuff at rcn.com
Mon Dec 1 17:23:15 PST 2008
Roland Smith writes:
> > pdftotext fail on the large [32MB] file I've got. Is there any
> > other way I can translate this huge textfile to ascii or html or
> > text?
>
> Please define "fail" in this context? I've used pdftotxt on
> documents exceeding 40MB. However there are of course things that
> don't work;
>
> 1) Some PDFs are just wrappers around JPEG images. In this case
> there is no text for pdftotext to convert => epic fail.
In this case "convert" from the ImageMagick port will get you a
series of .jpg/.gif/.<whatever>. Read the manual carefully before
attempting; also note this can be a slow process.
Robert Huff
More information about the freebsd-questions
mailing list