xpquery - apply XPath statements to XML or HTML data
Use xpquery to apply XPath statements to XML data. xpquery requires the XML::LibXML and XML::LibXML::XPathContext modules. Good for quick extraction of data, or to easily test XPath queries. Some HTML can also be parsed, via the -p HTML option. This relies on the parsing support in libxml2, and will fail on poorly formed HTML content.
Another option for scripts: use the XML::XPath module, which offers convenience methods for accessing query results. Other XML parsing options include XSH.
Technorati Tags: Perl
Example uses of xpquery:
Testing XPath expressions. Consult the XPath reference documentation for help when experimenting.
$ xpquery foo.xml '//a' <a><b>text</b> more text</a> $ xpquery foo.xml '//a/text()' more text $ xpquery foo.xml '//a//text()' text more text $ xpquery foo.xml '//a/descendant::text()' text more text $ xpquery foo.xml '//a/*' <b>text</b> $ xpquery foo.xml '//a/descendant::*' <b>text</b>
Microsoft.com fails parsing. How unexpected!
$ xpquery -p HTML \ http://www.microsoft.com/ '//title' /tmp/xquery.B5dv6L:1: HTML parser error : Couldn't find end of Start Tag meta ‾㰠楴汴㹥楍牣獯景⁴潃灲牯瑡潩㱮琯瑩敬‾㰠敭慴栠瑴⵰? ^ ERROR: Invalid expression
Then again, so do many other sites, including my own. Parsing HTML data requires the forgiving HTML::Parser module, though most needs will be better met by a wrapper module around HTML::Parser, such as HTML::TokeParser. Ideally sites would offer XML service interfaces, to eliminate the error prone HTML scraping of data, or worse, locking data up in annoying JavaScript and Flash. </rant>