any way to turn a pdf file into something OCR-able?

Robert Huff roberthuff at rcn.com
Mon Dec 1 17:23:15 PST 2008


Roland Smith writes:

>  > 	pdftotext fail on the large [32MB] file I've got.  Is there any
>  > 	other way I can translate this huge textfile to ascii or html or
>  > 	text?
>  

>  Please define "fail" in this context? I've used pdftotxt on
>  documents exceeding 40MB. However there are of course things that
>  don't work;
>  
>  1) Some PDFs are just wrappers around JPEG images. In this case
>  there is no text for pdftotext to convert => epic fail.

	In this case "convert" from the ImageMagick port will get you a
series of .jpg/.gif/.<whatever>.  Read the manual carefully before
attempting; also note this can be a slow process.


			Robert Huff




More information about the freebsd-questions mailing list