Im trying to extract the table of "AREA 1 Legal Frameworks | Criminalisation of consensual same-sex sexual acts" from https://database.ilga.org/criminalisation-consensual-same-sex-sexual-acts into a data table in R.
But when I build up my codes, I have a trouble in finding correct table nodes by their class to convert into a list of data frames.
So far I have the following code:
# Load the packages
library(RCurl)
library(xml2)
library(rvest)
# Download the web page
theurl <- "https://database.ilga.org/criminalisation-consensual-same-sex-sexual-acts"
webpage <- getURL(theurl)
# Parse the html document
htmldoc <- read_html(webpage)
# Find all the table nodes by their class
tablenodes <- html_nodes(htmldoc, ".tablesorter")
# Convert the table nodes to a list of data frames
tablelist <- html_table(tablenodes)
# Select the data frame that contains the table you want
tabledf <- tablelist[[1]]
But when I try to find the necessary table nodes, the tablenodes <- html_nodes(htmldoc, ".tablesorter") will return a list of 0 and prevent further steps to convert the table nodes to a list of data frames.
Can anybody help me figure out how to extract the correct table nodes and transfer them into a list of data frames?
Your problem is that the website is
JavaScriptpowered and apparently the table is loaded via JavaScript. You can easily verify that with the following code (in RStudio):You will see the HTML
rvestsees:You need to fallback to
RSeleniumand friends to make sure the JavaScript is loaded.Disclaimer: JavaScript powered webpages are often a nightmare to scrape from and I am by no means an expert, I developped the following code simply by trial an derror and maybe there are better/smarter ways of doing it w/o the heavy burden of
RSelenium(about which I woudl be also very curious to learn).