TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re. HTML Pandora's box From:Geoff Hart <geoff-h -at- MTL -dot- FERIC -dot- CA> Date:Mon, 26 Jun 1995 08:42:29 LCL
Chet Ensign raised a disturbing prospect: in our rush to get ever
better HTML pages onto the Web, we create a version maintenance
problem of epic proportions. I think he's got the right idea about
discussing it here... 'cause guess who's gonna be doin' the
maintenance?
I'm facing this problem too, since we're hoping to archive our old
publications onto CD in the hope of someday making them available
online. (No immediate plans, but I know what'll happen if I don't
cover my ASCII now.) The rest of the post is long, but I hope it'll
give you insights into your own strategies.
Here's my interim solution: store the files in their original file
format, with graphics stored separately in their original formats. On
the same CD, we'll be storing the same version of the operating system
and applications (Windows 3.1, AmiPro and Corel) that created the
material. As long as we own a PC that will boot this version of the
operating system, we can recover the data for printing, export etc. As
our applications evolve, we'll have to reopen the files in the new
versions of the application, resave them in the "current" format, and
so on into the future. In this manner, we use the "open files in
previous version's format" feature of the applications to do our
maintenance for us. (This approach _will_ break down at some point,
but as I noted, it's a reasonable interim solution.)
A longer-term solution is to store the files in ASCII and mark them up
with SGML tags. This offers advantages:
1. SGML is a superset of HTML, so it should be possible to convert the
files into HTML automatically if we use plain vanilla SGML features
likely to be supported in HTML. It should even be possible to do this
in batch mode. Even if no utility exists, a simple macro program in a
good text editor should do much of the conversion, and we have a few
good programmers who can write a more complex macro (or even a
standalone C or BASIC program) if need be.
2. SGML is and will probably continue to be an international standard,
so writing rigorously to the standard should make maintenance easier
than in other solutions. I'd suggest that we replace HTML with SGML if
I thought we could convince the powers that be to make this change.
But as Chet notes, product differentiation is more important than
standardisation in the computer industry.
3. The final solution that I'm contemplating is using a full-text
database that also stores "binary large objects" (BLOBs), such as
graphics files. This would be particularly helpful if I can convert
the 8-bit ASCII into the 16-bit unicode ASCII. The really nice thing
about this approach is that database software usually offers good file
management features, sometimes with a little programming, and decent
text searching performance and export facilities. With a little work
from our programmer friends, it should be possible to export the files
into a format that makes it easy to open them in a DTP or SGML
authoring program; for example, to drop a file into an SGML authoring
program, all you need to do is state that a field named "title" gets
saved to the file surrounded by the SGML tags that denote a title. I'm
leaning this way for a variety of reasons, most importantly that ASCII
and UNICODE are eternal: this means no version control problems, and
to maintain your files, all you need to do is edit the export filter,
not the files themselves.
This gets us part of the way there, particularly the latter approach,
but I'd be curious to see if anyone can poke holes in my strategy or
modify it to produce a better approach. I'm still woolgathering, but
at some point I'm going to have to make the sweater and your advice
will be greatly appreciated.
--Geoff Hart #8^{)} <--- lots'o'wool on top
geoff-h -at- mtl -dot- feric -dot- ca
Disclaimer: If I didn't commit it in print in one of
our reports, it don't represent FERIC's opinion.