TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:RE: HTML to XML conversion From:Jason Willebeek-LeMair <jlemair -at- cisco -dot- com> To:"TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com> Date:Tue, 15 Jan 2002 08:41:49 -0600
You can use HTMLTidy to batch convert a bunch of HTML files to XML.
However, what you get is an XML document that uses the HTML vocabulary.
Then, you can create an XSLT to convert those file to a target document type
(such as DocBook). You will probably have some manual cleanup to do later,
especially if your target markup is semantically richer than HTML.
We did this for 1000+ files, and it worked pretty well. However, if you only
need to do 1 or 2 HTML files, I would suggest doing it by hand. You will
spend less time doing that then writing the XSLT.
Jason
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Attention ForeHelp and Doc-to-Help Users! Upgrade your existing product to
RoboHelp for FREE, through January 15th. RoboHelp can import your existing
Help projects! Learn how else RoboHelp can benefit you. www.ehelp.com/techwr
---
You are currently subscribed to techwr-l as: archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit http://www.raycomm.com/techwhirl/ for more resources and info.