How to use XPath expressions in shell scripting using xmllint
This is a minor tip I want to share. A little example of a nice software feature that made my day.
I've been messing with HTML scrapping and I took a look on xmllint (maybe new) features. My intention was to extract a particular pattern, for which the --xpath option could be fine. I've never been very good tuning xpath expressions so I made a search about how to approach this. I found an amazing feature of the xmllint shell mode. As explanation here I show the workflow used:
- get your document, I used and HTML one
- I didn't tested with broken HTML but you can test it with xmllint --html
- get into shell: xmllint --html --shell [document], keep in mind [document] can be a remote URI.
- in the shell mode you can search for a precise string, in my case I chose the one inside the desired pattern: grep [string]
- here is when magic happens: xmllint answers with the xpath expression you can use for a xpath query
- exit the shell
- copy the extracted xpath expression to CLI: xmllint --html --xpath [xpath]
- here it is.
You can tune your expressions adding new predicates, as using specific attributes, or extracting the text() node, etc.