« initidy - reformat .ini style preference files | Main | Teff »

xpquery - apply XPath statements to XML or HTML data

Use xpquery to apply XPath statements to XML data. xpquery requires the XML::LibXML and XML::LibXML::XPathContext modules. Good for quick extraction of data, or to easily test XPath queries. Some HTML can also be parsed, via the -p HTML option. This relies on the parsing support in libxml2, and will fail on poorly formed HTML content.

Another option for scripts: use the XML::XPath module, which offers convenience methods for accessing query results. Other XML parsing options include XSH.

Technorati Tags:

Example uses of xpquery:

  • Testing XPath expressions. Consult the XPath reference documentation for help when experimenting.

    $ xpquery foo.xml '//a' <a><b>text</b> more text</a> $ xpquery foo.xml '//a/text()' more text $ xpquery foo.xml '//a//text()' text more text $ xpquery foo.xml '//a/descendant::text()' text more text $ xpquery foo.xml '//a/*' <b>text</b> $ xpquery foo.xml '//a/descendant::*' <b>text</b>

  • Microsoft.com fails parsing. How unexpected!

    $ xpquery -p HTML \ http://www.microsoft.com/ '//title' /tmp/xquery.B5dv6L:1: HTML parser error : Couldn't find end of Start Tag meta ‾㰠楴汴㹥楍牣獯景⁴潃灲牯瑡潩㱮琯瑩敬‾㰠敭慴栠瑴⵰? ^ ERROR: Invalid expression

    Then again, so do many other sites, including my own. Parsing HTML data requires the forgiving HTML::Parser module, though most needs will be better met by a wrapper module around HTML::Parser, such as HTML::TokeParser. Ideally sites would offer XML service interfaces, to eliminate the error prone HTML scraping of data, or worse, locking data up in annoying JavaScript and Flash. </rant>