TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: PDF to Word Conversion Tool From:letoured -at- together -dot- net To:"TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com> Date:Sat, 13 Jul 2002 21:05:09 -0400
Dick Margulis <margulis -at- fiam -dot- net> said:
>JimGroark -at- aol -dot- com wrote:
>>
>> What *I'm* looking for is something that will facilitate extracting multi-
>> page tables (or what look like tables) from .PDF files which are hundreds
>> of pages in length. Then, I want to be able to paste the "tables" into
>> MS Access or MS Excel, for manipulation.
>>
>Jim,
>That won't happen. There are no tables in a PDF file. There are just
>characters positioned on a page, with no tags, identification, or semantic
>relationships among them. They are only arranged in what appears to be a
>table because of the way your (table-cognizant) brain processes the image.
>What I've found with the tools I've tried is that if the table consists of
>nicely aligned columns of figures, you stand a fighting chance. If any cells
>have multiline text entries or if any columns have entries that vary
>dramatically in length, or if any cells are blank, you can basically forget
>about it.
>If you really need to put the data into tables and you have no access to the
>original document, the easiest and cheapest solution may be to pay someone
>to keyboard the data.
The trouble is that is that keying in numerical data makes errors -- and he
has hundreds of pages to do.
I don't know if this can be done in windows, with hundreds of pages to deal
with it might be worth a try; IBM's OS2 comes with tiff printer driver. You
can print a file to it and get a multi-page tiff file at fax resolution. I've
then used an OCR program, like TextBridge in the training mode to correct the
data, and saved it to a format needed.
-----------------------------------------------------------
letoured -at- together -dot- net
-----------------------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Save $600: Create great-looking Help files and software demos with
RoboHelp Deluxe. Get RoboHelp and RoboDemo - our new demo software - for one
low price. OR Save $100 on RoboHelp Office in June with our mail-in rebate.
Go to http://www.ehelp.com/techwr-l
Your monthly sponsorship message here reaches more than
5000 technical writers, providing 2,500,000+ monthly impressions.
Contact Eric (ejray -at- raycomm -dot- com) for details and availability.
---
You are currently subscribed to techwr-l as: archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit http://www.raycomm.com/techwhirl/ for more resources and info.