Re: Scanned docs to revise? (Take II)

Subject: Re: Scanned docs to revise? (Take II)
From: David Neeley <dbneeley -at- gmail -dot- com>
To: "TECHWR-L" <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Sun, 19 Jun 2005 11:47:09 -0500


Geoff has seen fit to dispute the results I got in testing OCR
software, holding out the PC Magazine tests as a contrary example.

In many cases, the accuracy of OCR is dependent upon the type font of
the original and upon the accuracy of any copies that are made.

As you may know, successive generations of xerographic copies become
increasingly distorted--making any OCR much more problematic.

In addition, the artifacts that develop (small spots and such) with
such successive generations can be additional challenges.

That said, some fonts hold up much better through copying. For
example, a 12-point slab serif monospace holds up quite well, as would
fonts that have regular stroke weight (Helvetica, for example).

TypeReader, that I mentioned before, is supposed to recognize more
than two thousand different fonts--quite a feat in retaining the
original formatting!

General accuracy, then, is highly variable with font (including size,
weight, tracking, etc.) fidelity of the input document (including
whether the lines of type are skewed as well as the faithful
reproduction of the text), and the resolution of the scan.

In my testing, quite a few documents were those that had originally
been printed from typewriter script...very clean and relatively large.
Should you attempt to use OCR on scans of very small original type
that may be a design with very small x-height, for example, your
results could certainly vary.

For general corporate communications, though, the accuracy may be far
higher than for other kinds of input documents.

Thus, I stand by the results I noted through two full days of tests on
many kinds of input materials--but all of them from the usual run of
corporate documents.

David

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Now Shipping -- WebWorks ePublisher Pro for Word! Easily create online
Help. And online anything else. Redesigned interface with a new
project-based workflow. Try it today! http://www.webworks.com/techwr-l

Doc-To-Help 2005 now has RoboHelp Converter and HTML Source: Author
content and configure Help in MS Word or any HTML editor. No
proprietary editor! *August release. http://www.componentone.com/TECHWRL/DocToHelp2005

---
You are currently subscribed to techwr-l as:
archiver -at- techwr-l -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- techwr-l -dot- com
Send administrative questions to lisa -at- techwr-l -dot- com -dot- Visit
http://www.techwr-l.com/techwhirl/ for more resources and info.



References:
Scanned docs to revise?: From: Geoff Hart
Re: Scanned docs to revise?: From: David Neeley
Scanned docs to revise? (Take II): From: Geoff Hart
Re: Scanned docs to revise? (Take II): From: Lou Quillio

Previous by Author: Re: Scanned docs to revise?
Next by Author: Re: TOOLS: more ergo stuff LCD Monitor Arms
Previous by Thread: Re: Scanned docs to revise? (Take II)
Next by Thread: Editing a PDF file


What this post helpful? Share it with friends and colleagues:


Sponsored Ads