How can I solve the Scopus API limitation issue using rscopus?

136 Views Asked by At

I am trying to analyze the article bibliography using Scopus API.

When I run scopus_search(), I seem to run into an error due to 5,000 limit API, so I change the start and max_count argument but it is not working.

rt_query <- scopus_search(query = "TITLE-ABS-KEY(Radiotherapy) AND PUBYEAR = 2022",
                          view = 'COMPLETE',
                          start = 5001,
                          max_count = 10000,
                          headers = insttoken)

Error in get_results(query, start = init_start, count = count, verbose = verbose,  : 
  Bad Request (HTTP 400).

The weird thing is that the limitation only strikes when I try to fetch information that starts after the 5000 response.

Any idea on how to resolve this?

1

There are 1 best solutions below

0
Solveig Bjørkholt On

Did you figure this out? I think maybe the rscopus package does not support the limitation issue anymore since there used to be an offset parameter in the Elsevier API that is now replaced by a cursor parameter (gleaned that from here).

Using the API through the httr worked for me. The cursor logic is documented here. The first query uses "cursor=*", while the next queries use the cursor parameter fetched from the first query, which can be found under search-results$cursor$@next.

I suppose your query would look something like this:

https://api.elsevier.com/content/search/scopus?query=title-abs-key(Radiotherapy)ANDpubyear(2022)&cursor=*&count=200&apiKey=YOURKEY

(I am getting 0 results from that API query, so I might have specified it wrong, but this works with other queries).

Then it's possible to for example run a loop and use the new cursor for every 200-units batch downloaded. Like so:

library(dplyr)
library(purrr)
library(httr)
library(jsonlite)

dfs <- list()
next_cursor <- "*"

for (i in 1:100) { # 100, or however many units you're trying to fetch divided by 200
  
  get_response <-  GET(url = paste0('https://api.elsevier.com/content/search/scopus?query=title-abs-key(Radiotherapy)ANDpubyear(2022)&cursor=', next_cursor, '&count=200&apiKey=YOURKEY'))
  
  json <- content(get_response, as = "text")
  
  df <- fromJSON(json, flatten = TRUE)
  next_cursor <- df$`search-results`$cursor$`@next`
  
  dfs[[i]] <- df %>% 
    pluck("search-results") %>%
    pluck("entry") %>%
    as.data.frame()
  
  message("Finished batch number ", i)
  
  Sys.sleep(0.2)
  
}

scopus_df <- bind_rows(dfs)