Databricks pyspark: notebook runs fine when clicking "run all" button, but gets pyspark AttributeError when running as scheduled job

25 Views Asked by azlefty At 25 March 2024 at 19:04

I have a simplified task that sometimes causes an error when converting a Pandas dataframe to a Pandas on Pyspark dataframe.

My notebook runs fine as a standalone notebook when I click the "Run All" button.

However, the conversion gives me an AttributeError whenever running as a scheduled task.

My notebook is being run on Databricks 13.3 LTS ML with similar servers in both cases. The pyspark package version is 3.5.1 in both cases.

The notebook contains this code:

import pyspark
import pandas as pd

data_dict = {"A": [1,3,2], "B": [3,5,4]}
pandas_df = pd.DataFrame(data=data_dict)
output_pdf = pyspark.pandas.from_pandas(pandas_df)

The error is in the last line, when running as the scheduled workflow job. It runs without error when run manually in the notebook using the "Run All" button.

Here is the error I get when running as a scheduled job:

AttributeError: module 'pyspark' has no attribute 'pandas'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File <command-2134319411417476>, line 1
----> 1 output_pdf = pyspark.pandas.from_pandas(pandas_df)
      2 display(output_pdf)

AttributeError: module 'pyspark' has no attribute 'pandas'

I tried restarting my server to see if that would cause the manually run notebook to fail, but it didn't.

What am I missing?

Original Q&A

Databricks pyspark: notebook runs fine when clicking "run all" button, but gets pyspark AttributeError when running as scheduled job

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PYSPARK

Related Questions in AWS-DATABRICKS

Trending Questions

Popular # Hahtags

Popular Questions