RE: Converting Word files into XML

Subject: RE: Converting Word files into XML
From: "Ole Andersen" <ora -at- dita-exchange -dot- com>
To: "Kevin McGowan" <thatguy_80 -at- hotmail -dot- com>, <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Fri, 30 May 2008 06:26:33 +0200

Kevin,

I agree it would be nice to have a little "mapping-tool" to smoothly migrate Word to DITA. But isn't converting data (if I understand you correctly) not just a first tiny step?

CHALLENGES?
What about the fact that you are converting monolithic documents to self-contained DITA topics (chunks)? In Word you will have all kind of references (see table above, see section XY on page ZX, hyperlinks and bookmarks from one chapter to another etc.) and you will have to deal with the fact that writing self-contained topics probably requires some training?

Apart from that you probably agree that not two random Word-authors have used the same set of Word Styles and this means that the "mapping tool" probably has to be very flexible?

Last but not least; consistency. Companies often claim that they use the Word styles in a consistent way across their document. But once you dig into the detail you will probably find that this is not entirely true?

BUT having said that, I still agree with you. It must be possible to create a tool like this to cover maybe 50% of the task and then "clean-up" the remaining 50% once the conversion is completed.

THE WAY WE DO IT...
When we are facing a challenge like this, the overall obstacle is to convert monolithic document to self-contained topics. This is in fact a manual (or semi-manual) task that will require some human interference.

But we are not suggesting to tag line-by-line.

In our solution you will have the browser (the DITA Exchange topic editor) open in one window and the Word file open in a second window. Then you cut/paste from one window to the other. The tagging will BTW not be visible (the tagging is done automatically) so the challenge is to cut from a Word document and to paste in a forms-based browser interface.
This means BTW that the migration/conversion process is not limited to XML-experts - subject matter expert can help in the process once they have had a few hours of guidance. So "grab" somewhat large chunks of Word content and paste them into the relevant field in the DITA Editor is currently our "best practice".

After that we extract the TOC from the Word document and compose the DITA map - but that's probably the easy task?

I'm not sure if you can use my input, but now I passed on the way we currently do it :-)

Thanks,
Ole

Best Regards
>< Content Technologies ApS
Ole Rom Andersen
Director, Co-founder
Harevej 23
DK-8660 Skanderborg
 
Mobile: +45-4044-0553
Phone: +45-3696-0899
Skype: olerom
ora -at- dita-exchange -dot- com
http://www.dita-exchange.com







-----Original Message-----
From: techwr-l-bounces+ora=dita-exchange -dot- com -at- lists -dot- techwr-l -dot- com [mailto:techwr-l-bounces+ora=dita-exchange -dot- com -at- lists -dot- techwr-l -dot- com] On Behalf Of Kevin McGowan
Sent: 29. maj 2008 17:39
To: techwr-l -at- lists -dot- techwr-l -dot- com
Subject: Converting Word files into XML


Hi all,

Recently started another new contract, and will most likely be moving some existing, thankfully small, documents from Word format into DITA - XML via FrameMaker or XMetal.

Thing is, I just chatted with a couple of guys here who's exclusive job it is to take GIANT Word file (they could range from 50-1200 pages) and convert them into XML (not DITA, but some other DTD). I just got a tour of what they do, and they literally go through line-by-line in Dreamweaver, assigning tags as they go.

Has no one yet developed an amazing little tool that could map Word styles into XML tags to provide clean output? There's gotta be a faster way to do this, isn't there?

Cheers,
Kevin

_________________________________________________________________
If you like crossword puzzles, then you'll love Flexicon, a game which combines four overlapping crossword puzzles into one!
http://g.msn.ca/ca55/208
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Create HTML or Microsoft Word content and convert to Help file formats or
printed documentation. Features include support for Windows Vista & 2007
Microsoft Office, team authoring, plus more.
http://www.DocToHelp.com/TechwrlList

True single source, conditional content, PDF export, modular help.
Help & Manual is the most powerful authoring tool for technical
documentation. Boost your productivity! http://www.helpandmanual.com

---
You are currently subscribed to TECHWR-L as ora -at- dita-exchange -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-unsubscribe -at- lists -dot- techwr-l -dot- com
or visit http://lists.techwr-l.com/mailman/options/techwr-l/ora%40dita-exchange.com


To subscribe, send a blank email to techwr-l-join -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
http://www.techwr-l.com/ for more resources and info.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Create HTML or Microsoft Word content and convert to Help file formats or
printed documentation. Features include support for Windows Vista & 2007
Microsoft Office, team authoring, plus more.
http://www.DocToHelp.com/TechwrlList

True single source, conditional content, PDF export, modular help.
Help & Manual is the most powerful authoring tool for technical
documentation. Boost your productivity! http://www.helpandmanual.com

---
You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-unsubscribe -at- lists -dot- techwr-l -dot- com
or visit http://lists.techwr-l.com/mailman/options/techwr-l/archive%40web.techwr-l.com


To subscribe, send a blank email to techwr-l-join -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
http://www.techwr-l.com/ for more resources and info.


References:
Is Vista "there" yet?: From: McLauchlan, Kevin
Re: Is Vista "there" yet?: From: sintac
RE: Is Vista "there" yet?: From: Combs, Richard
Converting Word files into XML: From: Kevin McGowan

Previous by Author: RE: XML content management systems
Next by Author: COTS Training Evaluations
Previous by Thread: Re: Converting Word files into XML
Next by Thread: Re: Converting Word files into XML


What this post helpful? Share it with friends and colleagues:


Sponsored Ads