I am working on a data prep tutorial, using data from this article: https://www.nytimes.com/interactive/2021/01/19/upshot/trump-complete-insult-list.html#
None of the text is hard-coded, everything is dynamic and I don't know where to start. I've tried a few things with packages rvest and xml2 but I can't even tell if I'm making progress or not.
I've used copy/paste ang regexes in notepad++ to get a tabular structure like this:
| Target | Attack |
|---|---|
| AAA News | Fake News |
| AAA News | Fake News |
| AAA News | A total disgrace |
| ... | ... |
| Mr. ZZZ | A real nut job |
but I'd like to show how to do everything programmatically (no copy/paste).
My main question is as follows: is that even possible with reasonable effort? And if so, any clues on how to get started?
PS: I know that this could be a duplicate, I just can't tell of which question since there are totally different approaches out there :\
Here's a programatic approach with RSelenium and rvest:
The xpath was obtained by inspecting the webpage source (F9 in Chrome), hovering over elements until the correct one was highlighted, right clicking, and choosing copy XPath as shown: