TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
I have been using Omnipage 14 quite a bit lately for scanning bad quality
faxes and protected PDFs and PDFs that are simply large scans, I have to
say it does a thigh-slapp'n good job including making a pretty good attempt
of keeping something that resembles the original formatting and images. For
those of you who aren't familiar with OCR programs, it also lets you see
what they think the original says in comparison to the actual original so
that you can make changes right away if necessary - also in different
languages.
For long docs, it can take a while but I think that depends more on your RAM
and processor speed. I recently did a 350 page PDF which was just a bunch of
pages scanned from a book (three colums per page and with words from up to 7
languages in each column). I went off letting my 500 MB RAM and 2.4 Ghz take
over and had lunch, then came back and there it was. It got all the columns
per page and all the words though it stumbled on the formatting a bit.
However the file was still quite usable for my purposes. I have so far found
only a few language-specific letter recognition problems (in this case, I
think it didn't feel like letting me check target to source because it was
pooped out but I don't blame it ;) but thankfully in the parts that I don't
need.
I have also heard good things about Abby FineReader...but have not yet tried
it out - there may be a trial version of it. There isn't of Omnipage 14
AFAIK.
HTH
Cass :)
> It's called "optical character recognition" or OCR...although I don't
> envy you doing "hundreds of pages"...
>
> The most advanced packages even try to maintain original formatting.
>
> Perhaps the most popular of the advanced packages is Omnipage Pro 14
> Office, from http://www.scansoft.com. As they say about it:
>
> "<snip>...marketing blurb </snip>."
>
> David
>
> On 6/17/05, twriter01 -at- hotmail -dot- com <twriter01 -at- hotmail -dot- com> wrote:
> >
> > Help! Suggestions on getting hundreds of scanned docs (.pdf)
> into MS Word
> > format. These docs were hard copies scanned into .pdf format.
> >
> > Only options I can think of are re-typing everything or using
> > voice-recognition. Any suggestions? Any technologies? TIA!
>
Now Shipping -- WebWorks ePublisher Pro for Word! Easily create online
Help. And online anything else. Redesigned interface with a new
project-based workflow. Try it today! http://www.webworks.com/techwr-l
Doc-To-Help 2005 now has RoboHelp Converter and HTML Source: Author
content and configure Help in MS Word or any HTML editor. No
proprietary editor! *August release. http://www.componentone.com/TECHWRL/DocToHelp2005
---
You are currently subscribed to techwr-l as:
archiver -at- techwr-l -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- techwr-l -dot- com
Send administrative questions to lisa -at- techwr-l -dot- com -dot- Visit http://www.techwr-l.com/techwhirl/ for more resources and info.