I have the following HTML code:
<textarea name="command" class="setting-input fixed-width" rows="9">1</textarea><textarea name="command" class="setting-input fixed-width" rows="5">2</textarea>
I would like to parse it to receive such output:
1
2
Currently I am using:
xmllint --xpath '//textarea[@name="command"]/text()' --html
but it does not append a newline after each match.
Hello from the year 2020!
As of v2.9.9 of libxml, this behavior has been fixed in
xmllintitself.However, if you're using anything older than that, and don't want to build libxml from source just to get the fixed
xmllint, you'll need one of the other workarounds here. As of this writing, the latest CentOS 8, for example, is still using a version of libxml (2.9.7) that behaves the way the OP describes.As I gather from this SO answer, it's theoretically possible to feed a command into the
--shelloption of older (<2.9.9) versions ofxmllint, and this will produce each node on a separate line. However, you end up having to post-process it withsedorgrepto remove the visual detritus of shell mode's (human-oriented) output. It's not ideal.XMLStarlet, if available, offers another solution, but you do need to use
xmlstarlet foto format your HTML fragment into valid XML before usingxmlstarlet selto extract nodes:If the
Attempt to load network entitymessage from the secondxmlstarletinvocation annoys you, just add2>/dev/nullat the very end to suppress it (at the risk of suppressing other messages printed to standard error).The XMLStarlet options explained (see also the user's guide):
fo -H -R— format the output, expecting HTML input, and recovering as much bad input as possible<html>root node, making the fragment in the OP's example valid XMLsel -T -t -v //xpath -n— select nodes based on XPath//xpath-T) instead of XML-t) that returns the value (-v) of the node rather than the node itself (allowing you to forgo usingtext()in the XPath expression)-n)Edit(s): Removed half-implemented
xmllint --shellsolution because it was just bad. Added an XMLStarlet example that actually works with the OP's data.