TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:RE: Converting Word files into XML From:Stuart Burnfield <slb -at- westnet -dot- com -dot- au> To:techwr-l -at- lists -dot- techwr-l -dot- com Date:Sat, 31 May 2008 00:04:33 +0800
> ... take GIANT Word file (they could range from 50-1200 pages)
> and convert them into XML (not DITA, but some other DTD). I
> just got a tour of what they do, and they literally go through
> line-by-line in Dreamweaver, assigning tags as they go.
> Has no one yet developed an amazing little tool that could map
> Word styles into XML tags to provide clean output?
It's harder than it looks. The problem is that Word docs are
unstructured, so it's not just a case of working through the doc line by
line and mapping each para style and character style to a corresponding
XML tag. The XML tags need to be nested correctly, and that nesting
information is mostly absent from the source document. If you could
guarantee that the Word document was formatted in a very consistent,
rigorous way, it would be a lot easier, but how often do you see a Word
doc like that?
> There's gotta be a faster way to do this, isn't there?
Yes. I've done a few conversions from Frame and Word to SGML (different
tags, same process). Frame's Save As XML maps preserves the name of the
Frame style--e.g. Bulleted1 para style becomes <Bulleted1> in the XML
file. You can then open the file in a text editor and use find/replace
to map these tag names to the valid XML tags. If you're handy with
regular expressions and macros you can automate a lot of the mapping.
Write if you want to know more about this.
This company does Word to *ML conversions. I haven't used them but they
will do a sample conversion so you can try before you buy:
"The Legacy Data Conversion Center can convert your Word Legacy Data to
any standard output format quickly and inexpensively. Our standard
conversion prices include converstion to DocBook, DITA, MIL-STD-38784,
and S1000D.
The Legacy Data Conversion Center supports conversion of Word legacy
data to any custom document type. For a small fee, we can convert your
legacy data to any proprietary or custom output format..."
Stuart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Create HTML or Microsoft Word content and convert to Help file formats or
printed documentation. Features include support for Windows Vista & 2007
Microsoft Office, team authoring, plus more. http://www.DocToHelp.com/TechwrlList
True single source, conditional content, PDF export, modular help.
Help & Manual is the most powerful authoring tool for technical
documentation. Boost your productivity! http://www.helpandmanual.com
---
You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-