XPath and docx documents

I’ve just been doing a quick script to rip a docx document to pod format for something internal and it’s made me extend my XPath skills a bit more.  The XPath Axes come in handy when you want something more subtle than // but you aren’t entirely sure how deep a child element is.  Then you can do something like “descendant::a:blip” which will find the a:blip element anywhere down the tree from your current element.

I’m putting the module I’ve just written on github for now.  It’s definitely not good enough for CPAN but it might be useful for reference so I figure it’s worth sharing.  It does a very basic rip of the xml from a docx file (since it’s basically a zip with xml and other resources) and then there is a script to turn that into simple pod.

https://github.com/colinnewell/WordDocxScraper

About these ads
Tagged ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 58 other followers