I have made many failed attempts to scrape a page from a website that I have successfully scraped in other use cases. In this particular case I can't seem yield anything but the error:
"Error in UseMethod("html_table"): no applicable method for 'html_table' applied to an object of class 'xml_missing'."
In general with web scraping in R, I've been having difficulty finding the right css selectors (or sequencing) and tools like SelectorGadget have been of little help.
See below for several code chunks I've tried. Grateful for any proposed solutions. As a bonus, any best resources on R webscraping in general are appreciated.
library(tidyverse)
library(rvest)
library(xml2)
url <- 'https://baseballsavant.mlb.com/leaderboard/percentile-rankings?type=batter&team='
hitting <- url %>%
read_html() %>%
html_node('#prLeaderboard div.table-savant') %>%
html_table()
url <- 'https://baseballsavant.mlb.com/leaderboard/percentile-rankings?type=batter&team='
hitting <- url %>%
read_html() %>%
html_node('#prLeaderboard div.table-savant') %>%
html_table()
url <- 'https://baseballsavant.mlb.com/leaderboard/percentile-rankings?type=batter&team='
hitting <- url %>%
read_html() %>%
html_element(xpath = "//div[@id='statcastHitting']/div[@class='table-savant']") %>%
html_table()
url <- 'https://baseballsavant.mlb.com/leaderboard/percentile-rankings?type=batter&team='
hitting <- url %>%
read_html() %>%
html_node("table") %>%
html_table()
The problem is that this table does not get sent as a HTML table, but inside a
<script>tag. You can see this by inspecting either the output ofread_htmlor the page source itself:In your browser this script will get executed & populate the table, but
rvestdoes no such thing. It is possible to evaluate the JavaScript and extract that variable though (see also here):There is however a much easier way to get these data: the website provides a direct CSV download. Just add
&csv=trueto the URL, no other tools needed: