The following python code works as intended but is incredibly slow, inserting max 50 rows a second and quickly tapers off to less than 10 rows a second.
def test_single_date():
date = [dt.date(2023, 1, 1)]
yield date
def get_data(url):
api_response = requests.get(url)
api_data = api_response.json()
return api_data
def process_dates():
# Step 1: Get the list of dates and create list for values to send to SQL
dates_to_process = test_single_date()
upsert_list = []
# Step 2: Generate URLs and retrieve data
for date_list in dates_to_process:
for single_date in date_list:
# Step 2a: Generate URL
base_hist_url = 'https://api_url/history/currency/{year}/{month}/{day}'
hist_url = base_hist_url.format(year=single_date.year, month=single_date.month, day=single_date.day)
# Step 2b: Retrieve data
hist_api_data = get_data(hist_url)
# Step 3: Upsert data
for k, v in hist_api_data['conversion_rates'].items():
upsert_list.append(['currency_code', k, v, single_date])
# Step 3a: Execute SQL upsert statement
upsert_sql(upsert_list)
# Step 4: Get the latest conversion rates
latest_url = 'https://api_url/latest/currency'
lastest_api_data = get_data(latest_url)
for k, v in lastest_api_data['conversion_rates'].items():
upsert_list.append(['currency_code', k, v, dt.datetime.now(dt.UTC)])
# Step 4a: Execute SQL statement
upsert_sql(upsert_list)
def upsert_sql(upsert_list):
sql = """
MERGE INTO SQL_TABLE AS Target
USING (
VALUES {}
) AS Source (currency_from, currency_to, factor, timestamp)
ON Target.currency_from = Source.currency_from
AND Target.currency_to = Source.currency_to
AND CAST(Target.timestamp as date) = CAST(Source.timestamp as date)
WHEN NOT MATCHED THEN
INSERT (currency_from, currency_to, factor, timestamp) VALUES (Source.currency_from, Source.currency_to, Source.factor, Source.timestamp);
""".format(','.join(['(?,?,?,?)' for _ in range(len(upsert_list))]))
params = [item for sublist in upsert_list for item in sublist]
try:
crsr.execute(sql, params)
crsr.commit()
except Exception as e:
crsr.rollback()
print(e)
print('Transaction rollback')
process_dates()
crsr.close()
cnxn.close()
By itself, the MERGE part is near instant with the small amount data I am working with. However, due to the extra steps add for additional functionality, the whole process is much slower. The goal is to generate a URL for every date that is not present in the SQL_TABLE, if any dates are missing (between 1/1/2023 and today), then send those requests to the API provider and MERGE with SQL_TABLE. After checking for missing data, the function should grab the most recent data and MERGE that with SQL_TABLE. I don't know much about PYODBC, or python in general for that matter, so any help with coding is very much appreciated.
Also, I have never been able to get pyodbc.executemany() (I tried to incorporate fastexecute) work with mssql MERGE, nor have I been able to get the MERGE to work with just (?,?,?,?) in the values section (i.e. fully parameterized), I always have to use format(). Any advice there would be great too.