I am trying to convert pandas dataframes to word tables. However for large dataframes the current process I'm using is extremely slow. This is because each cell has to be accessed one by one. The calling of the table.cells function within python-docx is what makes the code so slow as far as I'm aware
Is there a way to do this without having to call each cell seperately? Or is there another faster way to convert a pandas dataframe to a word table?
def add_table(df):
table = doc.add_table(df.shape[0]+1+(df.columns.nlevels -1), df.shape[1])
table.style = 'Table Grid'
#Add header rows for tables with more than 1 header
if df.columns.nlevels > 1:
for k in range(df.columns.nlevels):
for j, cell in enumerate(table.rows[k].cells):
cell.text = str(df.columns[j][k])
else:
# add the header rows.
for j in range(df.shape[-1]):
table.cell(0,j).text = df.columns[j]
# add the rest of the dataframe
for i in range(df.shape[0]):
for j, cell in enumerate(table.rows[i+1+(df.columns.nlevels -1)].cells):
cell.text = str(df.values[i, j])
input data:
Numb Description
0 301 DESC 1
1 302 DESC 2
2 303 DESC 3
3 304 DESC 4
4 305 DESC 5
... ... ...
2131 9108 DESC 6
2132 9109 DESC 7
2133 9110 DESC 8
2134 9111 DESC 9
2135 9112 DESC 10
expected output:
| Numb | Description |
|---|---|
| 301 | Desc 1 |
| 302 | Desc 2 |
| 303 | Desc 3 |
| 304 | Desc 4 |
| 305 | Desc 5 |
Edit: Found a great solution which calls the table.cells function only onces, and then iterates through this list of cell objects: https://github.com/python-openxml/python-docx/issues/174