TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Geoff has seen fit to dispute the results I got in testing OCR
software, holding out the PC Magazine tests as a contrary example.
In many cases, the accuracy of OCR is dependent upon the type font of
the original and upon the accuracy of any copies that are made.
As you may know, successive generations of xerographic copies become
increasingly distorted--making any OCR much more problematic.
In addition, the artifacts that develop (small spots and such) with
such successive generations can be additional challenges.
That said, some fonts hold up much better through copying. For
example, a 12-point slab serif monospace holds up quite well, as would
fonts that have regular stroke weight (Helvetica, for example).
TypeReader, that I mentioned before, is supposed to recognize more
than two thousand different fonts--quite a feat in retaining the
original formatting!
General accuracy, then, is highly variable with font (including size,
weight, tracking, etc.) fidelity of the input document (including
whether the lines of type are skewed as well as the faithful
reproduction of the text), and the resolution of the scan.
In my testing, quite a few documents were those that had originally
been printed from typewriter script...very clean and relatively large.
Should you attempt to use OCR on scans of very small original type
that may be a design with very small x-height, for example, your
results could certainly vary.
For general corporate communications, though, the accuracy may be far
higher than for other kinds of input documents.
Thus, I stand by the results I noted through two full days of tests on
many kinds of input materials--but all of them from the usual run of
corporate documents.
Now Shipping -- WebWorks ePublisher Pro for Word! Easily create online
Help. And online anything else. Redesigned interface with a new
project-based workflow. Try it today! http://www.webworks.com/techwr-l
Doc-To-Help 2005 now has RoboHelp Converter and HTML Source: Author
content and configure Help in MS Word or any HTML editor. No
proprietary editor! *August release. http://www.componentone.com/TECHWRL/DocToHelp2005
---
You are currently subscribed to techwr-l as:
archiver -at- techwr-l -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- techwr-l -dot- com
Send administrative questions to lisa -at- techwr-l -dot- com -dot- Visit http://www.techwr-l.com/techwhirl/ for more resources and info.