TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: Searching Text Files From:"Edgar D' Souza" <edgar -dot- b -dot- dsouza -at- gmail -dot- com> To:Ed <glassnet -at- gmail -dot- com> Date:Mon, 22 Sep 2008 16:18:19 +0530
I've rearranged your post a bit so I can reply in a different order :-)
On Mon, Sep 22, 2008 at 3:50 PM, Ed <glassnet -at- gmail -dot- com> wrote:
> I know there are Windows programs I could buy to do this. However, I'd
> prefer to stay with a command line solution, like Linux or DOS.
Er - not quite sure what your current platform is, but Python, Perl,
and XSLT processors are available on Linux as well as Windows, and one
of these (as well as other possible solutions) could be your answer.
> There is a directory of xml text files that I'd like to extract
> information from, and output to a textfile.
You could use Python or Perl with XML parser modules, or an XSLT
processor (which I'm a raw n00b at, so won't say much about).
You can also use regular expressions in both Python and Perl, if you
don't want to go the XML parser way.
If you do decide to try Python and XML parsing, I would strongly
recommend that you avoid xml.etree.ElementTree and xml.dom.minidom and
instead download and install lxml (http://codespeak.net/lxml/ ).
> The information appears
> between an open/close tag, for instance, <acroterm>SEC</acroterm>. Of
> course there are a myriad of possibilities, so there must be some
> wild-carding too.
You can use wildcards with regular expressions; I don't know about
using them with XML parser libraries.
> And then there is the recursion necessary to process
> all files of a particular type, such as *.xml.
ComponentOne Doc-To-Help gives you everything you need to author and
publish quality Help, Web, and print content. Perfect for technical
authors, developers, and policy writers. Download a FREE trial. http://www.componentone.com/DocToHelp/
True single source, conditional content, PDF export, modular help.
Help & Manual is the most powerful authoring tool for technical
documentation. Boost your productivity! http://www.helpandmanual.com
---
You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-