TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
For this code, grab the current snapshot or release 1.1 (Real Soon Now).
It is not in 1.0.
It is under the Gnu Public License, free to distribute, use, and modify.
I'd appreciate a copy of any interesting modifications.
WARNING: I wrote this to solve my own problem. It has not been designed
to handle all HTML files, or tested on any other than my own. It is
almost certain to misbehave somehow if applied to a large collection of
files I didn't write. On the other hand, you get the code, so whatever
problems appear can almost certainly be fixed.
For anyone unfamilar with the term, a permuted index takes a bunch of
one-line items like the simple command descriptions in Unix man pages
grep -- search for patterns in text
and turns them into a series of lines something like this:
search for patterns in text grep(1)
search for patterns in text grep(1)
search for patterns in text grep(1)
where grep(1) refers to the page titled "grep" in section 1 of the
manual. Every "content word" in the input gives one line. Words
like "for", "in", "of" and "the" are skipped.
This becomes interesting when you do it for every command and combine
the outputs, sorting on the central column so a piece of your index
is:
search for patterns in text grep(1)
text processing language awk(1)
text editor ed(1)
Now a user with some text-related problem can find all the utilities
that do something with text.
This certainly isn't a replacement for a good index, table of contents
or cross-references, but I consider it a useful addition and would be
quite embarassed to deliver any substantial set of docs without a
permuted index.