R data.table Efficiently parsing json data from API call

275 Views Asked by At

I am trying to download some data using API call, but I am sure the code can be optimized to a great extent. As of now I have make just 47 such calls, but in future this could go up to 20000 calls. Here is the code. Edit: Since the link is not accessible to everyone, I have saved the raw_data as an R Object at this link End of Edit

library(RJSONIO)
library(RCurl)
library(data.table)
url = "http://172.31.101.107:11000/wantedapi-v4.0/segments/occ4?usecache=true&responsetype=json&engine=sphinx&country=JP&showrepost=false&msa=5685-id&date=2013-10-20-2017-05-04&passkey=wanted&showstaffing=false&showanonymous=false&showbulk=false&showfree=true&showduplicate=false&showexpired=true&showaggregator=true&showactive=true&usestemming=false&market=country%2C116&methodology=available&pagesize=1000"
raw_data <- getURL(url)
# Then covert from JSON into a list in R
data1 <- fromJSON(raw_data)
data2 <- do.call(rbind, data1[[1]]$segments)
# data2 <- rbindlist(data1[[1]]$segments) #produces error
data3 <- transpose(data2)
data4 <- data.table(
                    count = data3[[1]],
                    id = data3[[2]],
                    official_Occ_code = data3[[3]],
                    translation = data3[[4]],
                    official_occ_name = data3[[5]]
                    )
1

There are 1 best solutions below

0
Matt Summersgill On

I didn't try to open the file since it's extension-less and not preview-able (my company's IT security group should be proud), but I've used the pipeline below for handling what I think may be a similar problem:

library(magrittr)
library(data.table)
library(jsonlite)
library(curl)

DI <- curl::new_handle()
curl::handle_setheaders(DI,"X-API-KEY" = "my_key_for_the_API")
DI_Request <- "https://api.somewebsite.com/v1/direct-access/foobar?format=json&page=1&pagesize=10000"
curl_fetch_memory(DI_Request,DI)$content %>% 
  rawToChar() %>% 
  fromJSON(flatten = TRUE) %>% 
  setDT() -> Output_Table