RE: Exporting text from recalcitrant PDFs

Subject: RE: Exporting text from recalcitrant PDFs
From: "Combs, Richard" <richard -dot- combs -at- Polycom -dot- com>
To: "Cardimon, Craig" <ccardimon -at- M-S-G -dot- com>, "Techwr-l" <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Fri, 17 Aug 2007 12:07:08 -0600

Cardimon, Craig wrote:

> The PDF I dealt with was not locked, but it was still uncooperative.

You'll have to be a bit more specific. What was "uncooperative" --
Acrobat? Adobe Reader? Something else? What version? How was it
uncooperative?

If you were using Acrobat, did you try all the text-based Save As
formats (doc, rtf, txt)? Did you try the Select tool and copying the
text?

> I resorted to a feature called "OCR Text Recognition," which
> allowed me to proceed, but the going was kind of brutal.

Oh, I see. Your PDF didn't _contain_ text! It contained _images_ of
text. It was created by scanning hardcopy pages.

OCR (optical character recognition) is the only option for converting a
bitmap image into editable text. It isn't perfect.

I assume the people who want this "data extracted" aren't ripping off
someone else's docs, so they should have the source files from which the
scanned pages were created. If those have been lost, explain to them
that imperfect OCR is the best you can do, and they need to be more
careful with their intellectual property in the future. :-)

Richard


------
Richard G. Combs
Senior Technical Writer
Polycom, Inc.
richardDOTcombs AT polycomDOTcom
303-223-5111
------
rgcombs AT gmailDOTcom
303-777-0436
------




^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Create HTML or Microsoft Word content and convert to Help file formats or
printed documentation. Features include support for Windows Vista & 2007
Microsoft Office, team authoring, plus more.
http://www.DocToHelp.com/TechwrlList

True single source, conditional content, PDF export, modular help.
Help & Manual is the most powerful authoring tool for technical
documentation. Boost your productivity! http://www.helpandmanual.com

---
You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-unsubscribe -at- lists -dot- techwr-l -dot- com
or visit http://lists.techwr-l.com/mailman/options/techwr-l/archive%40web.techwr-l.com


To subscribe, send a blank email to techwr-l-join -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
http://www.techwr-l.com/ for more resources and info.


References:
Exporting text from recalcitrant PDFs: From: Cardimon, Craig

Previous by Author: RE: poor/misleading signage
Next by Author: RE: Wikis
Previous by Thread: Exporting text from recalcitrant PDFs
Next by Thread: GNU Free Documentation License (GFDL)


What this post helpful? Share it with friends and colleagues:


Sponsored Ads