I am currently working on a problem where I need to process the groups of my dataset separately. After I process each group, I want to write it to a csv file so that it saves the progress (it can take days for me to process all the data each time and I don't want to risk losing progress if something happens). My current bodge/solution is to just use bind_rows after each iteration to build the final data, and after each iteration overwrite the csv file by writing the entire dataset (so far finished) to the csv file. However, this often causes R to basically freeze at this step at some point (it just gets stuck writing to the csv file until I terminate the session, even after leaving it for hours). I do not get this error when appending.
However, when I append, the variables are not always in the right order. Depending on how the group is processed, it may have a different set of variables or they may be in a different order. I am hesitant to hard code a variable list or order as it could change in the future, and even now has a large number of variations. I would love for a way to append new rows of data to the csv file, where it write the variables to the correct spot, and creates new variables if the new rows contain variables that don't yet exist in the csv file.
Does anyone have any ideas on how best to do this?
Here is an edited/simplified version of the relevant portion of my code:
#First process a group in my data:
processedDataTemp <- dataProcessingFunction(single_group_in_data)
#Next, bind rows just processed with the data previously processed (unless it was the first
#group to be processed)
if(!exists('processedDataOutput')){
if(!is.null(earlierProcessedData)){
processedDataOutput = processedDataTemp
}else{
processedDataOutput = bind_rows(processedDataOutput, processedDataTemp)
}
#Now, if a file name for the csv file is provided, write all the processed data to the csv
#(overwriting the previously written data)
if(!is.null(outputFileName)){
cat(str_pad('\r Writing to CSV',50, 'right'))
write_csv(processedDataOutput,file=outputFileName)
cat(str_pad('\r Finished Writing to CSV',50, 'right'))
}