I have an Excel file containing a list of 10,000+ documents (PDFs and Word document) that I aim to download. Each file is linked to a SharePoint URL.
My goal is to devise a script in R that can automatically access and download these documents.
I tried the following, but it resulted in damaged downloads.
library(readxl)
library(httr)
data <- read_excel("data/in/doclist.xlsx")
urls <- data$url
for (url in urls) {
# Send a GET request to the URL
response <- GET(url)
# Extract the file name from the URL
file_name <- basename(url)
# Specify the path where you want to save the downloaded files
save_path <- paste0("data/out/", file_name)
# Save the downloaded file
writeBin(content(response), save_path)
# Print a message to indicate the successful download
cat("File", file_name, "downloaded successfully.\n")
}
I cannot open the downloaded documents ("Word experienced an error trying to open the file" and "Adobe Acrobat Reader could not open xyz because it is either not a supported file type or because the file has been damaged").
I suspect the issue may be due to the two-factor authentication requirement to access these documents on SharePoint. When attempting to download publicly accessible PDFs, the code works smoothly, allowing me to open the downloaded files. Additionally, I can access the documents individually in SharePoint since I possess the necessary login credentials.
Any guidance or alternative approaches to do this would be greatly appreciated. Thank you in advance.