Saving and opening an Excel file with openxlsx: data is missing

79 Views Asked by At

I am currently writting a process in R that is supposed to import data from a server and store it in an Excel file as output. The process has several steps and must be automated throug Airflow, so manually opening the Excel file in between is not a possibility. Some of these steps use data previously exported to the same Excel file and are described in separate scripts, so the data can not be taken from the environment variables and has to be imported from the Excel document.

Since the client requires a specific format for this Excel, I can not generate a new one from scratch and have to use an existing one as template. The first thing I do is a copy of the file, substitute all currency values by zeros -to avoid interference with the new values- and load this template into R Studio. No modification is done to the formulas present in other sheets of the document.

Then I generate a dataframe with the data imported from the server, write this dataframe to the specified sheet in the document and save the workbook.

library(tidyverse)
library(lubridate)
library(openxlsx)

# 0. LOAD TEMPLATE -------------------------------------------------------

path <- "template.xlsx"

wb <- loadWorkbook(path)

# 5.2 Save df to wb -------------------------------------------------------

writeData(wb, sheet = "sheet_name", df, startCol = 1, startRow = 1, colNames = T)

# 5.3 Save wb as Excel ----------------------------------------------------

path_save <- "output.xlsx")

saveWorkbook(wb, path_save, overwrite = T)

If I manually open the Excel file the new data is there and the formulas in other sheets have successfully used that data to calculate results. The problem is: if I load this same Excel file back into a new dataframe in R Studio it appears still full of zeros, without the data that I previously exported onto it. If I manually open the file and save it, then the data appears when loading it into my R script.

I have read old threads where someone commented that the data is actually not saved onto the Excel file until you do it manually, but somehow stored in a cache. I have also seen sugestions about a command to force refreshing formulas before saving or loading a workbook, but sadly they only work with XLConnect or xlsx, and these do not work for me (java heap space problem that I can not solve).

Does anyone know of a solution for this problem? Since I need to load the new data in the Excel file as input for succesive Scripts and they must be automated I am quite conditioned by this problem.

0

There are 0 best solutions below