Convert Word files to XML?

Subject: Convert Word files to XML?
From: Richard Hamilton <dick -at- rlhamilton -dot- net>
To: "techwr-l -at- lists -dot- techwr-l -dot- com TECHWR-L" <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Fri, 25 Jul 2014 11:21:08 -0700

Hi Steve,

There are several factors. The most important are: what XML schema you are converting to, how clean your Word content is, and how much content you need to convert.

Bottom line for me is that if you have a lot of content to convert, you should seriously consider contracting the job out to a conversion company, unless you have some serious expertise with XSL and related tools.

Here is some detail on some tools to consider if you want to go it alone:

I convert Word to DocBook XML using Open Office, which will export DocBook directly. However, sometimes it's better to export HTML and then use a utility called Herold to convert to DocBook. And, I've also used the rather circuitous route of uploading Word to a Confluence wiki, then exporting DocBook using a plug-in exporter developed by a company called k15t software. Which I use in a given case depends on what the input looks like.

You can convert Word to DITA using DITA for Publishers (dita4publishers.sourceforge.net). I haven't used it myself, but I know the developer (Eliot Kimber), and he does quality work, so I'd definitely give it a try if you're headed towards DITA.

One caveat is that I've found it exceedingly rare that a conversion will be completely clean. You need to plan on doing some kind of cleanup using an XSL stylesheet, perl, manual editing, or a combination of all three on the output of any of these tools unless your input is really simple and well suited to the tool you use (which, with Word, I've never seen:-).

Best regards,
Richard
-------
XML Press
XML for Technical Communicators
http://xmlpress.net
hamilton -at- xmlpress -dot- net



On Jul 25, 2014, at 10:46 AM, Janoff, Steven wrote:

> Hi,
>
> For those with experience converting Word files to XML:
>
> What's the easiest or most effective way you've found to do this?
>
> Does it depend on the XML editor you're importing into?
>
> Arbortext is currently editor of choice, but I might also have the opportunity to install Oxygen at home.
>
> Thanks for your advice. I'll be researching on the web also, but that looks like a bit of a mish-mash.
>
> Steve
>
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Read about how Georgia System Operation Corporation improved teamwork, communication, and efficiency using Doc-To-Help | http://bit.ly/1lRPd2l
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> You are currently subscribed to TECHWR-L as dick -at- rlhamilton -dot- net -dot-
>
> To unsubscribe send a blank email to
> techwr-l-leave -at- lists -dot- techwr-l -dot- com
>
>
> Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
> http://www.techwhirl.com/email-discussion-groups/ for more resources and info.
>
> Looking for articles on Technical Communications? Head over to our online magazine at http://techwhirl.com
>
> Looking for the archived Techwr-l email discussions? Search our public email archives @ http://techwr-l.com/archives



^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Read about how Georgia System Operation Corporation improved teamwork, communication, and efficiency using Doc-To-Help | http://bit.ly/1lRPd2l

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-leave -at- lists -dot- techwr-l -dot- com


Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
http://www.techwhirl.com/email-discussion-groups/ for more resources and info.

Looking for articles on Technical Communications? Head over to our online magazine at http://techwhirl.com

Looking for the archived Techwr-l email discussions? Search our public email archives @ http://techwr-l.com/archives


Follow-Ups:

Previous by Author: Re: What to call a smallish document that describes a particular way to use a product?
Next by Author: Re: Is Search replacing the TOC and Index?
Previous by Thread: RE: Convert Word files to XML?
Next by Thread: RE: Convert Word files to XML?


What this post helpful? Share it with friends and colleagues:


Sponsored Ads