'File is not a zip file' error while writing excel file in python from SQL database using openpyxl

79 Views Asked by At

I am working in Pandas, I want to run a loop with a query on a list of tables in database and write the result of each loop in a separate sheet in Excel, but I get an error 'File is not a zip file' Could you help resolve it and help me make my code work? Maybe you have another workaround not using openpyxl?

My current code:

def create_workbook(path):
    workbook = Workbook()
    workbook.save(path)

create_workbook('result.xlsx')

def function_1(table, file):
    book = load_workbook(file)
    writer = pd.ExcelWriter(file, engine = 'openpyxl')
    writer.book = book
    query = f"select * from {table}"
    df = pd.read_sql_query(query , engine)
    df.to_excel(writer, sheet_name = table)
    writer.close()

for table in tables_list:
    function_1(table, 'result.xlsx')
1

There are 1 best solutions below

1
moken On BEST ANSWER

You are opening the newly created Excel file 'results.xlsx' with Openpyxl and then overwriting this file with a new empty file with ExcelWriter.
This is unnecessary.
And perhaps writer.book = book has an issue since when it's run 'result.xlsx' is no longer an XLSX file it is an empty file 0 bytes in size.

Either way as mentioned; opening the file with Openpyxl is unnecessary and shouldn't be done.
When ExcelWriter opens the file it uses Openpyxl to do so if you have specified the engine to be Openpyxl (otherwise it uses Xlsxwriter by default or other specified module ). There is no requirement to load the file using the engine module first and this could cause issues.
In your scenario it is also not necessary to create the file before hand since ExcelWriter default is 'write' mode and it will create that file itself. Therefore the 'create_workbook' function can be done away with.

All that's needed is the code below
'result.xlsx' will be created by the line writer = pd.ExcelWriter(file, engine='openpyxl')

import pandas as pd


def function_1(table, file):

    query = f"select * from {table}"
    df = pd.read_sql_query(query, engine)

    writer = pd.ExcelWriter(file, engine='openpyxl')
    df.to_excel(writer, sheet_name=table)

    writer.close()

function_1('Sheet1', 'result.xlsx')

If you use a context manager you can also do away with the close() too.

import pandas as pd


def function_1(table, file):
    query = f"select * from {table}"
    df = pd.read_sql_query(query, engine)

    with pd.ExcelWriter(file, engine='openpyxl') as writer:
        df.to_excel(writer, sheet_name=table)

function_1('Sheet1', 'result.xlsx')