TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Headers and footers in HTML documents - Or a lesson in banging your head against the wall
Subject:Headers and footers in HTML documents - Or a lesson in banging your head against the wall From:Geoff Hart <ghart -at- videotron -dot- ca> To:TECHWR-L <techwr-l -at- lists -dot- techwr-l -dot- com>, "Puffer, Paula (Paula)" <Paula -dot- Puffer -at- ElPaso -dot- com> Date:Wed, 24 May 2006 11:19:10 -0400
Paula Puffer reports: <<I've been given the task of improving our
process for moving manuals from Word Documents to HTML and PDF. The
current process was put in place before I got here. It was not tested
to see how it actually worked. It blows up on a regular basis and it
blows up differently each and every time.>>
Sympathies. On the plus side, at least you have an interesting problem
to solve and will look like a genius once you solve it... <gdr>
<<The process as it stands now is make corrections in Word, merge the
word files using a program called Twins File Merger and then convert
the merged file using another program called Click to Convert. They
picked Click to Convert because the PDF and HTML generated files looks
EXACTLY like the Word documents (including the headers and footers),
which is what my bosses want. The PDF is no big deal, but the HTML
generated by Click to Convert is awful. Every line is assigned a class
id and if you open it up in an editor like Frontpage or Dreamweaver,
it's a nightmare to look at.>>
Blech. The whole purpose of HTML is to allow the layout to be flexible.
Where fixed layout is important, that's why you use PDF. Different
solutions for different problems! Trying to make HTML into PDF makes
little or no sense.
If the class IDs are a problem, consider creating a macro of some sort
to strip them out or replace them with more appropriate tags. (More
details below.) Any good text editor should let you record such a
macro; if you don't want to add more software to your burden, use what
you already have: Word makes a great HTML editor ***if you change the
file name extension to *.txt and save files in that format (text)***.
Word does a poor job of saving to HTML, but changing the filename
tricks it into thinking that it's working only on a text file, and Word
is a powerful tool in that context. Just remember to change the
filenames back to .htm or .html when you're done.
The great advantage of using Word is that you already have it, it has a
decent search and replace tool (it's not grep, but it's still
surprisingly powerful), and it has a powerful macro language you don't
need to learn (i.e. you can record macros instead of programming them),
so you can automate quite a bit of the work with a bit of thought.
<<I'd like to see the process be something like make corrections in
Word and generate PDFs and HTML using Framemaker, ePublisher Pro, or
some equivalent program. At this point the PDF is not an issue with
them, it's the HTML. Anyone know of any programs that will convert word
docs exactly as they appear into HTML?>>
The PDF shouldn't be an issue, and you shouldn't need to move to Frame.
(Frame is clearly more powerful, but if the current system uses Word,
stick with what you've got until you have a compelling reason to
change.) Simply purchase a copy of Acrobat, and print to PDF. Problem
solved for PDF.
HTML can't be made to precisely mimic a Word document because HTML
documents are resizeable in browser windows, so it makes no sense to
have documents that don't wrap if users resize the window. Moreover, it
makes no sense to generate a separate HTML file for each page in the
Word document. Sheer nonsense! Each topic should become its own HTML
page, or perhaps a group of linked pages.
As noted above, Word's macro language will do enormous amounts of
cleanup in HTML files. To reduce the amount of cleanup, streamline the
style definitions used in the Word document. If you can map these
styles one to one to equivalent styles in a CSS stylesheet, that solves
much of the remaining problem. Again, a search and replace macro can
clean up the HTML files you generate so that they match the stylesheet.
<<I'm working on persuading my bosses that really in the HTML documents
the Header information is not needed because they want any printing
that happens to come from the PDF and not the HTML and that HTML is
about making the information accessible and not the presentation.>>
Headers are easy: they become the title information for each HTML file
(i.e., the name that appears at the top of the browser window when you
open the file). The rest of the header information is meaningless
because there are no page numbers in HTML... or there shouldn't be.
WebWorks ePublisher Pro for Word features support for every major Help
format plus PDF, HTML and more. Flexible, precise, and efficient content
delivery. Try it today!. http://www.webworks.com/techwr-l
Doc-To-Help includes a one-click RoboHelp project converter. It's that easy. Watch the demo at http://www.DocToHelp.com/TechwrlList