What's the fastest way to append to an exisitng excel file hosted in cloud without losing any styling or slicer for pivot table?
The excel is hosted in a cloud storage. The goal is to:
- Retrieve the excel file, that has pre-built pivots and slicers in it, no charts or specific cell-formatting. File has to stay in RAM.
- Export some data from database (around 9000 rows and 140 columns).
- Append data in the excel file to a specific sheet.
This will be deployed as a cloud function, so I can't save the excel file after downloading. Whatever writer I use has to be able to read excel from memory, append the data without messing up the pivots or slicers, and send the excel back to the user. All the steps has to happen under a minute, at least the part with excel. So looking for a bulk insert operation, rather than cell by cell or using loops.
I tried Pandas ExcelWriter, which seems slow (70 to 90 seconds), and does not preserve the slicers in excel files. Sample code below:
import pandas as pd, numpy as np, requests, time
from io import BytesIO
from datetime import datetime, timedelta
# Sample code
##########################################################################################
# Data generation
# This is only to generate sample data to replicate data that comes from the database
# Not related to actual code/data/process
sampledf = pd.DataFrame(np.random.rand(9000,140)*1000)
col_prefix = "col_"
row_prefix = "row_"
j = 0
colNames = []
rowNames = []
for i in range(1,141):
colNames.append(col_prefix+str(i))
for i in range(0,9000):
if i%1000 == 0:
j = i+1
rowNames.append(row_prefix+str(j))
sampledf.columns = colNames
sampledf['Names'] = rowNames
dateList=pd.date_range(start='2018-01-01', end=(datetime.strptime('2018-01-01', '%Y-%m-%d')+timedelta(days=999)).strftime('%Y-%m-%d')).to_list()
sampledf['Date'] = dateList*9
##########################################################################################
start = time.time()
########################
# Relevant code
########################
appendTabName = 'raw_data'
savename = 'pivotshell_with_data.xlsx'
excelFileUrl = "https://azurelearnstr.blob.core.windows.net/samplestg/pivotshell.xlsx?sp=r&st=2024-03-09T09:14:28Z&se=2024-03-19T17:14:28Z&spr=https&sv=2022-11-02&sr=b&sig=7dI3Pq8IM8HdRxSgFDoVjp74AJfDYPoCDByqlqK%2FSz0%3D"
# Get the shell file
shellfile = BytesIO(requests.get(excelFileUrl).content)
with pd.ExcelWriter(shellfile, mode='a', if_sheet_exists='replace', engine='openpyxl') as writer:
sampledf.to_excel(writer, sheet_name=appendTabName, index=False)
# This is just to check if the write operation was successful or not.
# Actual function would just send the <shellfile> as an HTTP response content
# As you can see the saved file loses the slicers
with open(savename, "wb") as f:
f.write(shellfile.getbuffer())
end = time.time()
duration = end - start
print(f"{duration:.2f} seconds")
I looked into other packages: python-calamine (read only), PyExcelerate (does not support reading an existing file or from in-memory, write only), and xlwings (couldn't work out editing excel in-memory).