TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: Need good tool for verifying links From:"Eric J. Ray" <ejray -at- RAYCOMM -dot- COM> Date:Fri, 25 Jun 1999 06:23:32 -0600
Walter Crockett wrote:
> Now we need to bring it from Windows NT into UNIX platform, and, among other
> things, the case sensitivity of the URLs is causing problems. We need a way
> to check all the links in the project quickly and also find out which links
> have mismatched cases, ie. image.gif and IMAGE.gif.
>
> Anybody have experience fine-tuning Linkbot, or working with other
> link-checkers that can handle large projects? There's nothing in the archives
> on this.
First, actually DO your link checking on a unix platform if at all
possible.
Seems that case-sensitivity is often fairly low priority (if a priority
at all) for Windows implementations of a link checker, and relying on
programs like Homesite to accurately _emulate_ case-sensitivity isn't
often reliable. (Yes, I'm speaking from experience here.)
(The perspective of Windows or UNIX shows in the way that you
and presumably many programmers in Windows consider the file names: You
see
image.gif and IMAGE.gif as the same file name with differing
capitalization,
while those with a UNIX background see two distinctly different
filenames.)
Linklint is a good choice for link checking--it's Perl script, a snap to
setup,
quite flexible, does exactly what it's supposed to do. Search at
www.yahoo.com or
the Perl archive of your choice to find it. You can specify many output
format
variations to easily find the problematic input or target files.
As far as fixing the links, you could take several approaches. If the
files
are fairly lightly linked (that is, few navigation links or
cross-references
within the files), I'd probably do much of the fixing manually--just
write a brief script to run linklink and open vi with all of the
problematic
files to edit in sequence. Then turn the music up, your brain off, and
go to it.
If the files are heavily cross-linked, then I'd opt for Dick's
suggestion
about writing a script to do the heavy lifting. In addition to his
suggestion of Lisp, a unix shell script and sed can do this, as could
perl (more cleanly than sed, actually). Assuming you're moving
everything
to consistent lowercase, the script shouldn't be a huge task (read, less
than 20 lines in sed), even for a beginner at scripting.
Additionally, you might check out HTML Tidy, written by Dave Raggett
of HP (he's one of the HTML and XML drivers at the W3C--see www.w3.org).
It'll do a wonderful job of cleaning up the HTML code to give you
something
clean and consistent to start your link-fixing with. (Starting with
pristine,
consistent, and clean input is the best way to ensure successful search
and
replace operations like this one.)