pandas.DataFrame.to_sql inserts data, but doesn't commit the transaction

9.3k Views Asked by At

I have a pandas dataframe I'm trying to insert into MS SQL EXPRESS as per below:

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine("mssql+pyodbc://user:password@testodbc")
connection = engine.connect()

data = {'Host': ['HOST1','HOST2','HOST3','HOST4'],
    'Product': ['Apache HTTP 2.2','RedHat 6.9','OpenShift 2','JRE 1.3'],
    'ITBS': ['Infrastructure','Accounting','Operations','Accounting'],
    'Remediation': ['Upgrade','No plan','Decommission','Decommission'],
    'TargetDate': ['2018-12-31','NULL','2019-03-31','2019-06-30']}

df = pd.DataFrame(data)

When I call:

df.to_sql(name='TLMPlans', con=connection, index=False, if_exists='replace')

and then:

print(engine.execute("SELECT * FROM TLMPLans").fetchall())

I can see the data alright, but it actually doesn't commit any transaction:

D:\APPS\Python\python.exe 
C:/APPS/DashProjects/dbConnectors/venv/Scripts/readDataFromExcel.py
[('HOST1', 'Apache HTTP 2.2', 'Infrastructure', 'Upgrade', '2018-12-31'), ('HOST2', 'RedHat 6.9', 'Accounting', 'No plan', 'NULL'), ('HOST3', 'OpenShift 2', 'Operations', 'Decommission', '2019-03-31'), ('HOST4', 'JRE 1.3', 'Accounting', 'Decommission', '2019-06-30')]

Process finished with exit code 0

enter image description here

It says here I don't have to commit as SQLAlchemy does it:

Does the Pandas DataFrame.to_sql() function require a subsequent commit()?

and the below suggestions don't work:

Pandas to_sql doesn't insert any data in my table

I spent good 3 hours looking for clues all over the Internet, but I'm not getting any relevant answers, or I don't know how to ask the question.

Any guidance on what to look for would be highly appreciated.

UPDATE

I'm able to commit changes using pyodbc connection and full insert statement, however pandas.DataFrame.to_sql() with SQLAlchemy engine doesn't work. It send the data to memory instead the actual database, regardless if schema is specified or not.

I would really appreciate help with this on, or possibly it is a panda issue I need to report?

2

There are 2 best solutions below

0
S.B.G On BEST ANSWER

I had the same issue, I realised you need to tell pyodbc which database you want to use. For me the default was master, so my data ended up there.

There are two ways you can do this, either:

connection.execute("USE <dbname>")

Or define the schema in the df.to_sql():

df.to_sql(name=<TABELENAME>, conn=connection, schema='<dbname>.dbo')

In my case the schema was <dbname>.dbo I think the .dbo is default so it could be something else if you define an alternative schema

This was referenced in this answer, it took me a bit longer to realise what the schema name should be.

0
luke On

I had a similar problem: when trying to write use df.to_sql (from pandas) with a sqlalchemy engine created with mssql+pymssql.

sqlalchemy.exc.OperationalError: (pymssql._pymssql.OperationalError) Cannot commit transaction: (3902, b'The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')

Turns out that the issue had to do with properly managing query commitments and connections closing. The easiest way to manage this was by using SQLAlchemy's built in compatibility with Python's with

 SQL_CONNECTION =  sqlalchemy.create_engine('mssql+pymssql://'+ 'SQL_USERNAME' +':' + qp(SQL_PASSWORD) + '@'+ SQL_SERVER + '/'+ SQL_DB) #TODO make username dynamic
 with SQL_CONNECTION.connect() as connection:
        with connection.begin():
            df.to_sql(SQL_TABLE, connection, schema='dbo', if_exists='replace')