TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: Pdf to Word From:"Dick Margulis " <margulis -at- mail -dot- fiam -dot- net> To:"TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com> Date:Sun, 2 Mar 2003 15:33:04 -0500
Alan,
You are volunteering to take on quite a challenge. A PDF is not, by default, a document. There are some PDFs that include embedded information that helps with retrieving document-type structural information. But for the most part, a PDF is just a rectangular space with characters placed on it in spacial relation to each other but not in any particular order.
Even the seemingly simple task of looking for instances of a particular word in a PDF is a non-trivial challenge (one that has been solved, but non-trivial nonetheless).
I suggest you research some of the work that has already been done in this field. You might start with www.pdfzone.com, looking in particular for some of the third-party apps that purport to extract structured document information from PDFs. You might also pay attention to the tools built into Acrobat 5 (far from perfect, but pretty good, all things considered).
Look in Google Groups for posts in comp.text.pdf on the subject. Aandi Inston, in particular, the great guru of that forum, has expounded articulately on the theoretical and technical challenges involved in attempting what you want to do.
I am not one to say something is impossible. When people tell me that, I just take it as a dare to prove them wrong. I would like to suggest, though, that this may not be a one-person spare-time project.
As to the question of whether there is a demand for it, yes, people ask for such a tool almost every day.
Good luck,
Dick
"Alan Jordan" <vote -at- lyndan -dot- net> wrote:
>
>I am a student looking to create a java based application to convert files
>from pdf format to Msword format. The resultant application is intended to
>be an open source one.
Order RoboHelp Office X3 by March 14 and receive a $100 mail-in rebate,
plus FREE WebHelp Merge Module and FREE iMarkup Software, for a
total giveaway value of $439! Order today at http://www.ehelp.com/techwr-l
Help celebrate TECHWR-L's 10th Anniversary starting this month!
Check out the contests at http://www.raycomm.com/techwhirl/special/contests/
Happy birthday to you, happy birthday to you, happy birthday TECHWR-L....
---
You are currently subscribed to techwr-l as:
archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit http://www.raycomm.com/techwhirl/ for more resources and info.