TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
>Well, let me put
>it this way. XML processing software (parsers, processors and so on)
>are supposed to be simple to create (read cheap and fast). By
>implication they cannot be very 'clever' and so the XML markup
>(unlike SGML) must be quite simple and very explicit.
Which is potentially misleading. One of the design goals of XML was to make
it possible to write a parser quickly and easily. To achieve this they
removed the ability to omit tags which can be inferred from the context of
the markup, removed the ability to have short forms of tags, and placed
restrictions on content models.
Simon is not, I believe, intending to imply that all XML processing software
is not "clever". Software does not have to be very clever to parse XML, but
it does have to be extremely clever to provide robust and powerful
processing facilities for XML. XML reduces the parsing challenge, relative
to SGML, but it does not reduce the processing challenge.
Similarly, I don't think that Simon intends to imply that XML markup
languages must be simple relative to SGML. XML based markup languages can
and should be just as sophisticated as SGML based ones, according to what
the job requires. The difference between XML and SGML lies in the morphology
of the tags. It is not really a matter of being more or less explicit,
because both must be completely explicit, but in XML every beginning and end
tag must be present and no short forms are allowed.
To illustrate the difference, here's an SGML version of Simon's DTD and
marked-up document:
<!DOCTYPE poem [
<!ENTITY % inline "#PCDATA|emphasis">
<!ELEMENT poem o o (front, body)>
<!ELEMENT front o o (title, author, rev+)>
<!ELEMENT title o o (%inline;)*>
<!ELEMENT author - o (%inline;)*>
<!ELEMENT rev - o (%inline;)>
<!ELEMENT body o o (stanza|line)+>
<!ELEMENT stanza - o (line)+>
<!ELEMENT line - o (%inline;)*>
<!ELEMENT emphasis - - (%inline;)*>
]>
Unknown
<author>Anonymous, from the Eugenics Review, 1929
<rev>1998-4-08: XML markup added
<rev>1998-4-17: Changed to SGML markup
<stanza>
<line>See the happy moron!
<line>He doesn't give a damn.
<line>I wish I were a moron,
<line>My God! perhaps I am!
I have removed the unnecessary item tag, allowing clearer markup of
revisions. The "o o" and "- o" in the element declarations indicate what
tags can be omitted. "o" means the tag can be omitted, "-" that it must be
retained. This example doesn't include any short forms, because the DTD
stuff for them would be too long. Using them it would be possible to reduce
the markup to something like this:
Unknown
by Anonymous, from the Eugenics Review, 1929
~1998-4-08: XML markup added
~1998-4-17: Changed to SGML markup
~1998-4-17: SGML shortrefs added
<stanza>
See the happy moron!
He doesn't give a damn.
I wish I were a moron,
My God! perhaps I am!
This is much easier to read and write (a key goal of SGML) but, as you can
imagine, much harder to parse (ease of parsing being a key goal of XML).
---
Mark Baker
Manager, Corporate Communications
OmniMark Technologies Corporation
1400 Blair Place
Gloucester, Ontario
Canada, K1J 9B8
Phone: 613-745-4242
Fax: 613-745-5560
Email mbaker -at- omnimark -dot- com
Web: http://www.omnimark.com