TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Conclusions: Searching for an HTML-to-Word Conversion Tool
Subject:Conclusions: Searching for an HTML-to-Word Conversion Tool From:Susan Peradze <susan -dot- peradze -at- peri -dot- com> To:TECHWR-L -at- lists -dot- raycomm -dot- com Date:Wed, 08 Sep 1999 11:48:08 -0400
Thanks to everyone who responded to my request for help in locating a
superior tool for converting Word documents to HTML. If you recall my
original posting, our tech writing group had certain formatting in our
Word docs that we wanted to maintain in HTML with little rework after
conversion.
The respondents were overwhelmingly in favor of HTML Transit, which we
tested, along with others, and decided on as our final choice. Other
tools suggested were: Word (using the "save as HTML" feature), RoboHelp,
Doc-to-Help, Piper Toolkit, converting to XML, and linking the Word
files (in Word format) to HTML docs.
Following are selections from the summary that one of my colleagues
wrote on our findings.
"In an effort to find the best tool for converting our MSWord 97
documents to HTML format, we evaluated several software packages. Of
course, Word's built-in "Save As HTML" function was first to be
considered followed closely by MS Front Page as it is a corporate
standard tool. In addition to these readily available tools, several
others were downloaded from the Internet in the form of evaluation
copies. Some, like RoboHelp and SkiSoft, were considered early enough on
that their trial period had long since expired by the time these results
were collated. Suffice it to say that their results were less than
desirable. Other tools, like Ant and RTF to HTML, were so out of the
ballpark as to not be included.
Our Word documents include numbered section and subsection titles. For
example, "1.1 Introduction" is achieved in our documents simply by
typing in the title of the section and applying a heading style to it.
Word automatically inserts the proper chapter number, a period, and the
section sequence number. Figure and table titles are automatically
numbered in the same way. Word's conversion dropped the chapter number
and retained the sequence number in these areas. Word to Web dropped the
chapter and sequence numbers from the section titles but retained the
sequence number only for figure and table titles. While Front Page
properly retained the chapter and sequence number for sections, it
dropped the chapter number from figures and tables. Filtrix seemed to
use a numbering scheme of its own for sections, and in some cases
actually got the figure or table number right. Transit properly
converted both section numbers and figure/table numbers.
Single-level lists, whether bulleted or numbered, were well done by
Word, Word to Web, and Transit. Front Page dropped both numbers and
bullets. Filtrix converted the numbers satisfactorily but not the
bullets.
Word dropped one level of a three-level bulleted list as did Word to
Web. The latter also replaced the bullets at levels one and two with
numbers. While Front Page nicely retained the indents of the same
bulleted list, it used no bullets. Filtrix failed to retain the proper
character and indents. Transit, while losing the indent at level two,
kept three levels of bullets. For the multilevel numbered list, Word and
Word to Web both made a fair attempt, only losing the indent at one
level and not faithfully representing our style at levels three and
four. Front Page again retained proper indentation but dropped all
numbering. Filtrix could convert neither proper numbering nor proper
indentation. Transit yielded an almost perfect conversion.
In our documents, especially within procedures, lists of this type are
often interrupted with explanatory text or a figure. In these instances
it is important that the numbering pick up where it left off before the
interruption. Both Word and Word to Web failed this test by restarting
the numbers. Filtrix numbered the interruption, and Transit accurately
numbered the list.
Lists aside, the indentation of text is another matter of concern
because it was found that one tool would indent differently than
another, particularly with regard to section headings. Word and Front
Page wanted to indent the headings but not the text that belonged to
them. Word to Web set everything flush left, and Filtrix seemed erratic,
sometimes indenting headings and sometimes not. Transit indented both
headings and text as seen in the original document.
Most tools tested automatically converted images to .gif format. A
screen capture and a Visio line drawing was used in the test. All tools
except Transit and Filtrix gave the line drawing a black background
making the drawing useless. Transit offered a perfect conversion while
Filtrix refused to insert any figure.
We know from earlier trials that standard ASCII symbols are properly
translated. We also know that other, nonstandard symbols are not. The
large asterisk, for example, that we use to represent the telephone star
keypad entry was not accurately translated by any tool.
There are many occasions when, to achieve proper text alignment, tables
are used without borders (rules). All except Word to Web seemed to pass
a simple test.
Selective ruling in a table is another matter, however. None of the
tools tested could pass this test.
In conclusion, Transit clearly appears to be the winner. There are only
three areas that were found not to be perfect. Two of these areas are
unavoidable (nonstandard symbols and selective rules within tables) as
they reflect limitations of HTML and not the conversion process. The
multilevel bulleted list, while not perfect, can be made to convert
properly (presumably)."
I hope those of you in the same predicament will be helped by this
information.
Susan Peradze
Staff Technical Writer
Implementation Department
Periphonics Corporation
4000 Veterans Memorial Highway
Bohemia, New York 11716
susan -dot- peradze -at- peri -dot- com