TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: character string order From:Sandy Harris <sharris -at- dkl -dot- com> To:TECHWR-L <techwr-l -at- lists -dot- raycomm -dot- com> Date:Wed, 12 Jan 2000 11:56:11 -0500
Geoff Lane wrote:
>
> > -----Original Message-----
> > From: Benzi Schreiber
> >
> > I have a bunch of character strings that are sorted as follows:
> >
> > adam < armageddon < bob < beryl
> >
> > The programmers are calling this "lexicografical order", but
> > the closest
> > I've found is "lexicografic". Does the word "lexicografical" exist?
> > Does either of these words describe the order I'm using?
> ---
>
> "Lexicographical" exists in the Oxford English Dictionary.
British spelling, and certainly how I (Canadian) would spell it.
Are the forms with 'f' standard American or just errors?
(I'm mildly horrified either way :-)
> It is a adjective related to, "compiling a dictionary" -- my
> interpretation is that it means, "in dictionary order".
Mine, too. I'd use the simpler term "lexicographic order", but would
understand either.
> Your example is not in this order (it should be adam,
> armageddon, beryl, bob).
>
> If the strings only contain alpha characters and the sort is not
> case-sensitive, I'd describe the sorting as 'alphabetical'. Otherwise I'd
> use the method that the program does the sorting (for example, "in ASCII
> order" or, "in EBCDIC order") and give a simple example if necessary.
I agree there.
I think "lexicographic order" implies some attempt to use rules beyond
that, though I'm not sure exactly what those should be. Some possible
rules would sort:
"22" among the 't' words, sorted as "twenty-two"
"22" after "3" although a straight character sort puts it before
"St. Louis" as if it were spelled out "Saint L...",
"St. Louis" as if it were "StLouis", ignoring non-alpha
Dictionaries don't use a simple ASCII sort. If you say you're sorting
in "lexicographic order", I'm going to assume that you're not either,
that your sort procedure implements some set of rules akin to those
above.
The Unix sort utility has a -d (dictionary order) option. A look at
its manual and some experimentation with the utility might be useful
I suspect, though, that it is only a partial solution, implementing
some convenient subset rather than everything a lexicogrpher might
want.
There was a classic paper on computerized sorting by such rules. I
don't recall if it was lexicographic rules or the different set used
in phone books. I thought it was by Knuth, but checking his home page: