I have a json like
[
{
"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link",
"title": "– Flexibility"
},
{
"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra",
"title": "– Pronouns"
}
]
I got it using curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.'.
I have a command in my machine named unescape_html, a python scipt to unescape the html (replace – with appropriate character).
How can I apply this function on each of the titles using jq.
For example:
I want to run:
unescape_html "– Flexibility"
unescape_html "– Pronouns"
The expected output is:
[
{
"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link",
"title": "– Flexibility"
},
{
"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra",
"title": "– Pronouns"
}
]
Update 1:
If jq doesn't have that feature, then i am also fine that the command is applied on rg.
I mean, can i run unescape_html on $2 on the line:
rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}'
Any other bash approach to solve this problem is also fine. The point is, i need to run unescape_html on title, so that i get the expected output.
Update 2:
The following command:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= @sh "unescape_html \(.)")'
gives:
[
{
"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link",
"title": "unescape_html '– Flexibility'"
},
{
"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra",
"title": "unescape_html '– Pronouns'"
}
]
Just not evaluating the commands.
The following command works:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= sub("–"; "–"))'
But it only works for –. It will not work for other reserved characters.
Update 3:
The following command:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq -r '.[] | "unescape_html \(.title | @sh)"' | bash
is giving:
– Flexibility
– Pronouns
So, it is applying the bash function. Now I need to format is so that the urls are come with the title in json format.
OP Here.
Both of the following commands are working for me:
and