I have a question about running a Python script in a package. And then running that package in a Sql Server Job.
Goal
First, let me briefly describe what I would like to achieve: With a python script I want to fetch Yahoo Finance stock data for certain stocks and I want to save that data into a SQL Server database.
Next I would like to include that process as a step in a SQL Server Job, so that all the data (included the stock data) is loaded into my database at once. For me the easiest way to achieve that was to make a SSIS package with the script and then run the script in a Job.
I'm running all of this on my local server.
Problem
I've made the script and it runs succesfully in both the command interpreter, Visual Studio and SSMS (when I run the script manually):

But when I run the package in a jobstep I got the following error:

I already searched on StackOverflow, Google and ChatGPT but I didn't find the solution to solve this problem yet. I suspect that it probably has to something with the 'user' SQLSERVERAGENT, but I already added SQLSERVERAGENT to the folder where the Python files are located.
These are the settings I use in Visual Studio:

And this is the Python script I'm running:
import pandas as pd
import yfinance as yf
from sqlalchemy import create_engine
import sqlalchemy
from datetime import datetime
servername = "XXXXXXX"
dbname = "sql_prtfl_dw"
table = "lz_yahoo_finance_stock_prices"
engine = create_engine("mssql+pyodbc://@"+ servername + "/"+ dbname + "?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server")
conn = engine.connect()
TickerSymbols = pd.read_sql(
"""
SELECT
Ticker
FROM DimAssets
WHERE 1=1
AND AssetClass IN ('Equities')
AND IsCurrent = 1
"""
, engine)
conn.invalidate()
engine.dispose()
TickerSymbols = TickerSymbols["Ticker"].values.tolist()
servername = "XXXXXXX"
dbname = "sql_prtfl_lz"
table = "lz_yahoo_finance_stock_prices"
engine = create_engine("mssql+pyodbc://@"+ servername + "/"+ dbname + "?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server")
conn = engine.connect()
def sql_importer(symbol, table=table, start="2019-01-01"):
try:
max_date = pd.to_datetime(pd.read_sql(f"SELECT MAX(Date) FROM [{table}] WHERE Ticker IN ('{symbol}')", engine).values[0][0])
print(max_date)
new_data = yf.download(symbol, start=max_date)
new_data.insert(0, "Ticker", symbol)
new_data.insert(7, "EtlLoadDate", datetime.now())
new_rows = new_data[new_data.index > max_date]
new_rows.to_sql(table, engine, if_exists="append")
print(f"Ticker: {symbol} - {str(len(new_rows))} extra rows imported in table: {table}")
except:
new_data = yf.download(symbol, start=start)
new_data.insert(0, "Ticker", symbol)
new_data.insert(7, "EtlLoadDate", datetime.now())
new_data.to_sql(table, engine, if_exists="append")
print(f"Ticker: {symbol} - First time import of data. {str(len(new_data))} new datarows are imported in table: {table}")
for ticker in TickerSymbols:
sql_importer(ticker)
conn.invalidate()
engine.dispose()
Thanks in advance!
python myfile.pytrusted_connection=yesis directing the driver to doread_sqlwhich is outside the sole try/catch block you have. As you already have a method in your file, add another one for getting the ticker symbols and record any exceptions. Inside yoursql_importeryou assume that if there was an error at all, it's due to the first run of a symbol and blindly attempts to pull for a different date range. There's a whole host of things that could have also gone wrong, like a network or firewall issue, that caused the first exception that are still going to exist when you try again inside the catch block.A potential rewrite could look something like