To parse reddit.com, I use
xidel -e '//div[@data-click-id="background"]/div[@data-adclicklocation="title"]/div/a[@data-click-id="body"]/@href|//div[@data-click-id="background"]/div[@data-adclicklocation="title"]/div/a[@data-click-id="body"]/div/h3/text()' "https://www.reddit.com/r/bash"
So the base XPath is repeated 2 times, then I decided to use a xidel variable:
xidel -se 'xp:=//div[@data-click-id="background"]/div[@data-adclicklocation="title"]/div/a[@data-click-id="body"]' \
-e '$xp/@href|$xp/div/h3/text()' 'https://www.reddit.com/r/bash'
but the output differs from previous command.
Bonus if someone can give a way to remove \n concatenation but space concatenation, tried fn:string-join() and fn:concat() with no cigar.
Tried || " " || too, but not the expected url <description> for each matches
The output doesn't differ if you would've added
--extract-exclude=xp. Please see my answer here and the quote from the readme in particular.What you're probably seeing:
These are the text-nodes from your XPath-expression. It does actually save the element-nodes, but
--output-node-format=textis the default afterall.However, you really don't need these kind of internal variables for situations like this. I personally only use them for exporting to system variables. If you want to use variables, use a FLWOR expression:
But the simplest query, without the need for variables, would probably be:
String-joining is as simple as:
With
||don't forget the parentheses, or there's no context-item fordiv/h3.The last one is Xidel's own extended-string-syntax.
Alternatively, you could parse the huge JSON, which surprisingly lists a lot more Reddit questions: