TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: Unicode (was Re: HTML, ASCII, and Homesite) From:"Jeanne A. E. DeVoto" <jaed -at- JAEDWORKS -dot- COM> Date:Tue, 5 Jan 1999 20:41:02 -0800
At 3:31 PM -0800 1/5/99, AlQuin wrote:
>On 04-01-1999 23:09 Mark Baker wrote:
>>At some time in the future we may or may not see Unicode widely adopted, in
>>which case a lot of these problems go away.
>
>Another HTML alternative? Can you generate some more information on
>Unicode?
Unicode is not a markup language like HTML, but an expanded "universal"
character set.
The ASCII character set uses 7 bits per character and can encode a total of
128 characters (the upper and lower-case alphabets, numbers, basic
punctuation, and control characters). Since ASCII lacks a lot of characters
people want to use (such as accented vowels), ASCII has been extended to 8
bits (256 characters) - there are several 256-character standards. Unicode
is a further, much broader extension, to 16 bits (65,536 characters) -
enough so that you can write most human languages in Unicode characters.
Unicode includes, among much else, the m-dash that started this thread a
few days ago.
There is some relationship between HTML and Unicode: HTML 3.2, by default,
used ISO-Latin, the most widespread of the 256-character standards. HTML
4.0 uses Unicode. Since ISO-Latin is a subset of Unicode - the first 256
characters of Unicode are the same as the 256 characters in ISO-Latin -
there is backward compatibility from Unicode to the older standard.
Unfortunately, the browser makers are still working on forward
compatibility. ;-)