Re: Pdf to Word

Subject: Re: Pdf to Word
From: "Dick Margulis " <margulis -at- mail -dot- fiam -dot- net>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Sun, 2 Mar 2003 15:33:04 -0500


Alan,

You are volunteering to take on quite a challenge. A PDF is not, by default, a document. There are some PDFs that include embedded information that helps with retrieving document-type structural information. But for the most part, a PDF is just a rectangular space with characters placed on it in spacial relation to each other but not in any particular order.

Even the seemingly simple task of looking for instances of a particular word in a PDF is a non-trivial challenge (one that has been solved, but non-trivial nonetheless).

I suggest you research some of the work that has already been done in this field. You might start with www.pdfzone.com, looking in particular for some of the third-party apps that purport to extract structured document information from PDFs. You might also pay attention to the tools built into Acrobat 5 (far from perfect, but pretty good, all things considered).

Look in Google Groups for posts in comp.text.pdf on the subject. Aandi Inston, in particular, the great guru of that forum, has expounded articulately on the theoretical and technical challenges involved in attempting what you want to do.

I am not one to say something is impossible. When people tell me that, I just take it as a dare to prove them wrong. I would like to suggest, though, that this may not be a one-person spare-time project.

As to the question of whether there is a demand for it, yes, people ask for such a tool almost every day.

Good luck,

Dick

"Alan Jordan" <vote -at- lyndan -dot- net> wrote:

>
>I am a student looking to create a java based application to convert files
>from pdf format to Msword format. The resultant application is intended to
>be an open source one.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Order RoboHelp Office X3 by March 14 and receive a $100 mail-in rebate,
plus FREE WebHelp Merge Module and FREE iMarkup Software, for a
total giveaway value of $439! Order today at http://www.ehelp.com/techwr-l

Help celebrate TECHWR-L's 10th Anniversary starting this month!
Check out the contests at http://www.raycomm.com/techwhirl/special/contests/
Happy birthday to you, happy birthday to you, happy birthday TECHWR-L....

---
You are currently subscribed to techwr-l as:
archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.



Follow-Ups:

Previous by Author: Re: The odds of finding work through job ads
Next by Author: RE: techie or font fondler? - introspection ad nauseum
Previous by Thread: Pdf to Word
Next by Thread: RE: Pdf to Word


What this post helpful? Share it with friends and colleagues:


Sponsored Ads