TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: Web indexing From:John Seaman <jseaman -at- EUKC -dot- FREESERVE -dot- CO -dot- UK> Date:Sat, 26 Jun 1999 15:18:23 +0100
wrt Rowena Hart's posting on indexing web sites.
I've done a couple of big HTML documents that needed an index.
I really wanted to supply a search engine with the documents, but the
customer couldn't accept any deliveries that included executable code or
anthing that would require them to install Java VM. This excluded all the
search engines I could find out about at the time, so I had to produce an
index.
I considered the format Rowena mentions:
>digital certificates, creating 1 2 3 4
>
>where the numbers are hyperlinks back to the referenced term.
but decided that simple numbers didn't tell the readers enough for them to
decide whether it was the bit of the document they needed to see. Instead,
I decided on subentries based on the title of the topic in which they
occured. Something like this:
digital certificates, creating
from scratch
for awkward customers
when Jupiter aligns with Mars
where each of the subentries was a hyperlink back to the referenced term.
When I got down to it, I found this was taking way too long to set up given
that I was working to the "corporate standard deadline" (i.e. current date
minus two days). Fortunately, the larger parts of the documents in question
were generated from databases so I could arrange for the index files to be
created semi-automatically. I imagine something similar could be set up for
any set of HTML files; a fairly straightforward bit of programming should be
able to search each file, copy everything between heading tags and squirt it
into a separate file, along with appropriate hyperlink tags to point back to
the file it came from.
The disadvantage of the resulting index was that entries and subentries
consisted of the unrefined text of the topic headings. For example:
creating digital certificates
rather than
digital certificates, creating
which is what you would usually see in an index.
This could cause some frustration for readers of a paper document, but in an
online version, they could use their browser's Find function to search for
topics by key word. I added a short instruction on using the Find function
at the top of the index just to be sure. Feedback from the customers is
good so far.
I'm conscious of the fact that my solution was no better than an acceptable
workaround given the time constraints, so I'm looking forward to postings
from others who have come up with more considered solutions.