TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Can anyone point me to information dealing with character encoding? Specifically, I have the following situation. I receive "text" files from software developers. From these text files I create (via a Perl script) an xml file that I transform to html or load into Structured Frame. I thought that all I would have to do is take care to convert those characters you need to convert so as not to confuse the xml (<, ', etc.). But it seems that the issue is more complex than that, and I don't even know exactly how to phrase the problem. I think it stems from the fact that the developers unknowingly submit different kinds of "text", (some unicode, some UTF-8, some ascii)(please excuse me if I am being imprecise here - I am fuzzy on these issues). For instance, a common occurrence is that the developers copy and paste from Word and include smart quotes. Now, in a text editor I can search and replace these with straight quotes, but I can't figure out how to get Perl to be able to do this automatically. If I just leave the smart quotes in, Structured Frame manages to interprete the characters correctly, but in the html document I produce via XSLT the "smart" quotes show up as boxes. Now I know that there is something about encoding that you can handle in the xml and html files themselves. But I am just not sure how to fit this all together, or how to deal with this problem. Does anyone have any experience in handling these kinds of issues? Is there a way of detecting what encoding your "text" files are in? Is there a way of automatically converting them to a particular chosen format?
Thanks,
Paul
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Now Shipping -- WebWorks ePublisher Pro for Word! Easily create online
Help. And online anything else. Redesigned interface with a new
project-based workflow. Try it today! http://www.webworks.com/techwr-l
Doc-To-Help 2005 now has RoboHelp Converter and HTML Source: Author
content and configure Help in MS Word or any HTML editor. No
proprietary editor! *August release. http://www.componentone.com/TECHWRL/DocToHelp2005
---
You are currently subscribed to TECHWR-L as archive -at- infoinfocus -dot- com -dot-