TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Searching through a batch of PDFs? Index them! From:"Geoff Hart" <geoff-h -at- mtl -dot- feric -dot- ca> To:TECHWR-L -at- lists -dot- raycomm -dot- com Date:Tue, 12 Oct 1999 10:11:35 -0400
Darren Barefoot has a <<...growing number of technical
bulletins on our external Web site. [in PDF format] Currently,
they're in one big long page... and, in an ideal world,
I like to provide the user with a search field on a Web page
which would allow them to perform a full text search through
all of the PDF files.>>
Although I can't help you find a search tool that will solve
your need, I do have a far better suggestion for you (since I'm
one of those folks who finds that search engines are rarely
worth my time): index the PDFs. The problem with all full-
text search engines is that they're simply not context-sensitive
(e.g., they can't tell whether "Report Manager" refers to the
person who manages your reporting division or the software
that lets users create reports). And they won't be context-
sensitive anytime soon.
In marked contrast, an index is something created by a
human who has empathy for how users will try to locate
information. By picking a few carefully chosen keywords for
each PDF file, you can greatly facilitate the task of finding
the best PDF file for a user's particular need. And even if you
don't automate the process of managing the index, it's not a
particularly painful task to maintain the index manually. For a
few more thoughts on this topic and a better exploration of
my rationale, check out my article "Index the Web"
(Intercom, June 1999, p. 26-28). One update to the article:
HTML Indexer, which I mentioned almost dismissively as
'promising', has been updated substantially since the time
(almost a year ago) I first saw it and mentioned it in my
article; the publication timelag is such that the product has
improved substantially beyond what I mentioned in the article
and is now a valuable production tool.
--Geoff Hart @8^{)} geoff-h -at- mtl -dot- feric -dot- ca (Pointe-Claire, Quebec)
"Perhaps there is something deep and profound behind all those sevens, something just calling out for us to discover it. But I
suspect
that it is only a pernicious, Pythagorean coincidence." George Miller, "The Magical Number Seven" (1956)