I am relatively new to R and I am looking for an automated way of saving the text of hundreds of .docx files into one .csv file which I can then use for computational text analysis. The docx files are all similarly structures but have different file names. Each docx file should be one row in the table. I would like them to be ordered into a table with the following columns: date, URL, title, text. Moreover, I would like to add a column that includes an ID for each row. Can anyone help me?
So far, I tried to do this with the readtext() function. Which worked for up to the point where I tried to put the different parts into one dataframe. Also, I do not know yet how to create a loop for multiple files that are named differently.
library(readtext)
#read text
doc.text <- readtext(".../mytext.docx")$text
# x$text will contain the plain text in the file**
# Split text into parts using new line character:
doc.parts <- strsplit(doc.text, "\n")[[1]]
doc.parts
#first line in the document: title
title <- doc.parts[1]
title
#extract the date
date <- doc.parts[4]
#put in dataframe
x2 <- c(x, date, title)