How to efficiently download multiple files from SharePoint using R

82 Views Asked by At

I have an Excel file containing a list of 10,000+ documents (PDFs and Word document) that I aim to download. Each file is linked to a SharePoint URL.

My goal is to devise a script in R that can automatically access and download these documents.

I tried the following, but it resulted in damaged downloads.

library(readxl)
library(httr) 

data <- read_excel("data/in/doclist.xlsx")
urls <- data$url

for (url in urls) {
  # Send a GET request to the URL
  response <- GET(url)
  
  # Extract the file name from the URL
  file_name <- basename(url)
  
  # Specify the path where you want to save the downloaded files
  save_path <- paste0("data/out/", file_name)
  
  # Save the downloaded file
  writeBin(content(response), save_path)
  
  # Print a message to indicate the successful download
  cat("File", file_name, "downloaded successfully.\n")
}

I cannot open the downloaded documents ("Word experienced an error trying to open the file" and "Adobe Acrobat Reader could not open xyz because it is either not a supported file type or because the file has been damaged").

I suspect the issue may be due to the two-factor authentication requirement to access these documents on SharePoint. When attempting to download publicly accessible PDFs, the code works smoothly, allowing me to open the downloaded files. Additionally, I can access the documents individually in SharePoint since I possess the necessary login credentials.

Any guidance or alternative approaches to do this would be greatly appreciated. Thank you in advance.

0

There are 0 best solutions below