SSIS package with Python Script doesn't run in a SQL Server Job

636 Views Asked by At

I have a question about running a Python script in a package. And then running that package in a Sql Server Job.

Goal

First, let me briefly describe what I would like to achieve: With a python script I want to fetch Yahoo Finance stock data for certain stocks and I want to save that data into a SQL Server database.

Next I would like to include that process as a step in a SQL Server Job, so that all the data (included the stock data) is loaded into my database at once. For me the easiest way to achieve that was to make a SSIS package with the script and then run the script in a Job.

I'm running all of this on my local server.

Problem

I've made the script and it runs succesfully in both the command interpreter, Visual Studio and SSMS (when I run the script manually): link

But when I run the package in a jobstep I got the following error: link

I already searched on StackOverflow, Google and ChatGPT but I didn't find the solution to solve this problem yet. I suspect that it probably has to something with the 'user' SQLSERVERAGENT, but I already added SQLSERVERAGENT to the folder where the Python files are located.

These are the settings I use in Visual Studio: link

And this is the Python script I'm running:

import pandas as pd
import yfinance as yf
from sqlalchemy import create_engine
import sqlalchemy
from datetime import datetime

servername = "XXXXXXX"
dbname = "sql_prtfl_dw"
table = "lz_yahoo_finance_stock_prices"

engine = create_engine("mssql+pyodbc://@"+ servername + "/"+ dbname + "?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server")
conn = engine.connect()

TickerSymbols = pd.read_sql(
    """
    SELECT 
        Ticker
    FROM DimAssets
    WHERE 1=1
        AND AssetClass IN ('Equities')
        AND IsCurrent = 1
    """
    , engine)

conn.invalidate()
engine.dispose()

TickerSymbols = TickerSymbols["Ticker"].values.tolist()

servername = "XXXXXXX"
dbname = "sql_prtfl_lz"
table = "lz_yahoo_finance_stock_prices"

engine = create_engine("mssql+pyodbc://@"+ servername + "/"+ dbname + "?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server")
conn = engine.connect()

def sql_importer(symbol, table=table, start="2019-01-01"): 
    try: 
        max_date = pd.to_datetime(pd.read_sql(f"SELECT MAX(Date) FROM [{table}] WHERE Ticker IN ('{symbol}')", engine).values[0][0])
        print(max_date)
        new_data = yf.download(symbol, start=max_date)
        new_data.insert(0, "Ticker", symbol)
        new_data.insert(7, "EtlLoadDate", datetime.now())
        new_rows = new_data[new_data.index > max_date]
        new_rows.to_sql(table, engine, if_exists="append")
        print(f"Ticker: {symbol} - {str(len(new_rows))} extra rows imported in table: {table}")
    except: 
        new_data = yf.download(symbol, start=start)
        new_data.insert(0, "Ticker", symbol)
        new_data.insert(7, "EtlLoadDate", datetime.now())
        new_data.to_sql(table, engine, if_exists="append")
        print(f"Ticker: {symbol} - First time import of data. {str(len(new_data))} new datarows are imported in table: {table}")
        
for ticker in TickerSymbols: 
    sql_importer(ticker)
    
conn.invalidate()
engine.dispose()

Thanks in advance!

2

There are 2 best solutions below

0
billinkc On
  1. If the only thing your SSIS package does is call the Execute Process Task, I would think it would greatly simplify your life to change the SQL Agent job type from SSIS package to Operating System task and the steps become what you have in the Execute Process Task. aka python myfile.py
  2. You mention SQLSERVERAGENT in your problem definition but I don't see any reference to it in the configuration you've provided, maybe it's just my aging eyes.
  3. If the SQL Agent account is running as a local user, and the server you are attempting to write data to is elsewhere, then yeah, you probably have a authorization/authentication problem. Running it from the command line or an editor will work fine as you're credentials are being presented to the server, which is what trusted_connection=yes is directing the driver to do
  4. Logging and catching are missing cases. If the root cause is authorization/authentication, then it'll hit at your first call to read_sql which is outside the sole try/catch block you have. As you already have a method in your file, add another one for getting the ticker symbols and record any exceptions. Inside your sql_importer you assume that if there was an error at all, it's due to the first run of a symbol and blindly attempts to pull for a different date range. There's a whole host of things that could have also gone wrong, like a network or firewall issue, that caused the first exception that are still going to exist when you try again inside the catch block.

A potential rewrite could look something like

import pandas as pd
import yfinance as yf
from sqlalchemy import create_engine
import sqlalchemy
from datetime import datetime

# Don't reuse variable names
# Datawarehouse variables
dw_servername = "XXXXXXX"
dw_dbname = "sql_prtfl_dw"

# Landing zone variables
servername = "XXXXXXX"
dbname = "sql_prtfl_lz"
table = "lz_yahoo_finance_stock_prices"

def get_ticker_symbols():
    engine = create_engine("mssql+pyodbc://@"+ dw_servername + "/"+ dw_dbname + "?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server")
    conn = engine.connect()
    TickerSymbols = []
    try:

        TickerSymbols = pd.read_sql(
        """
        SELECT 
            Ticker
        FROM DimAssets
        WHERE 1=1
            AND AssetClass IN ('Equities')
            AND IsCurrent = 1
        """
        , engine)

        conn.invalidate()
        engine.dispose()

        TickerSymbols = TickerSymbols["Ticker"].values.tolist()
    except Exception as ex:
        # TODO: at a minimum print the error, 
        # preferably use something better like the native logging library
        print(ex)
        pass

    return TickerSymbols

def sql_importer(symbol, table=table, start="2019-01-01"): 
    try: 
        max_date = pd.to_datetime(pd.read_sql(f"SELECT MAX(Date) FROM [{table}] WHERE Ticker IN ('{symbol}')", engine).values[0][0])
        print(max_date)
        new_data = yf.download(symbol, start=max_date)
        new_data.insert(0, "Ticker", symbol)
        new_data.insert(7, "EtlLoadDate", datetime.now())
        new_rows = new_data[new_data.index > max_date]
        new_rows.to_sql(table, engine, if_exists="append")
        print(f"Ticker: {symbol} - {str(len(new_rows))} extra rows imported in table: {table}")
    except Exception as ex:
        print("Hey, bad thing happened ")
        print(ex)
        new_data = yf.download(symbol, start=start)
        new_data.insert(0, "Ticker", symbol)
        new_data.insert(7, "EtlLoadDate", datetime.now())
        new_data.to_sql(table, engine, if_exists="append")
        print(f"Ticker: {symbol} - First time import of data. {str(len(new_data))} new datarows are imported in table: {table}")


# Spin up connection for the landing zone
engine = create_engine("mssql+pyodbc://@"+ servername + "/"+ dbname + "?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server")
conn = engine.connect()

# Or skip this assignment and directly enumerate by moving the function call below
TickerSymbols = get_ticker_symbols()
for ticker in TickerSymbols: 
    sql_importer(ticker)

conn.invalidate()
engine.dispose()
0
Feathers_McGraw On

Thanks for your response billinkc!

After looking at your feedback and some extra research I came to the following solution:

  1. As you already mentioned, it's easier to run the script through a CMD job, so that's what I did.
  2. But that still gave me an error. It couldn't find the packages that were used in the Python script.
  3. Because the Python script ran fine when I ran it myself I thought it had something to do with the access of SQLSERVERAGENT to the Python packages.
  4. I discovered that my first installation of Python was only for the current user and not for all users (in Windows). So I deinstalled Python and reinstalled it again for all users.
  5. The next step was following the process in this tutorial: https://www.mssqltips.com/sqlservertip/7083/run-python-scripts-sql-server-agent-job/.
  6. Like the author of the tutorial I made a virtual environment but I do think it would work without a virtual environment.

Long story short: because I changed a lot of things at the same time I'm not 100% sure what the cause of the error was. But I suspect it raised an error because Python was installed for the current user and the packages were stored in a folder only accessible for the current user. That would explain why I could run the Python script in a cmd or package but gave an error when ran in a job by SQLSERVERAGENT.