Pulling data from Teradata using Python - specifically how to 'chunk' data so that hardware resources are not overrun

194 Views Asked by ripvw32 At 20 June 2023 at 18:41

and thank you in advance for any ideas/suggestions/solutions. I have basic knowledge of Python, so if I am misstating something, or missing something please let me know. I am absolutely trying to learn as I go.

So, onto my issue: Have a DB that I don't own (but have read access to) with data I need to pull into my DB for further parsing/joining/etc. Source DB is Teradata, destination DB is Vertica. I have the below script up and running, however the full pull is multiple 10s of millions of rows. I can return 1M rows no problem, so what I want to do is to 'chunk' the data into 1M rows of data, append to my destination DB table, and continue on until all data from the source table has been consumed. Thoughts?

# Import libraries
import teradatasql
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool
import sqlalchemy as sa


server_name = 'HostServerName'
domain_user = 'DomainAccount'
domain_password = 'DomainAccountPassword'
login_mechanism = 'LDAP'

Host3 = 'VerticaServerName'
UserName3 = 'VerticaUser'
Password3 = 'VerticaUserPassword'
Database3 = 'VerticaUserDB'

# Query against DB
query1 = 
'SELECT Column1,Column2,Column3,Column4 FROM HostServerName.Table1 WHERE Column3 IS NOT NULL'

#Connect to the HostServerName/Teradata DB and pull data
with teradatasql.connect(
  host=server_name, user=domain_user
, password=domain_password
, logmech=login_mechanism
, encryptdata='true'
) as connect:
 data = pd.read_sql(query1, connect
)
print(data)

# update the dataframe type, converts 'text'/string values to varchar
def updateType(data_para):
    dtypedict = {}  # create and empty dictionary
    for i,j in zip(data.columns,data.dtypes):
        if "object" in str(j):
            dtypedict.update({i: sa.types.VARCHAR})

    return dtypedict
updatedict = updateType(data)

#Connect to the Vertica DB and append data to table
engine = create_engine(f'vertica+vertica_python://{UserName3}
      :{Password3}@{Host3}/{Database3}'
     , pool_pre_ping=True, poolclass=NullPool
)
data.to_sql(
  'VerticaUSERDB.Table2'
, engine
, schema='VerticaSchema'
, if_exists='append'
, dtype=updatedict
, index=False
)

I did try to do this using some other answers on stackoverflow, however it errored out and I don't understand the errors I received.

Original Q&A

Pulling data from Teradata using Python - specifically how to 'chunk' data so that hardware resources are not overrun

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in TERADATA

Related Questions in VERTICA

Related Questions in CHUNKS

Trending Questions

Popular # Hahtags

Popular Questions