Error to connect pyspark to postgreSQL in Jupyter notebook

15 Views Asked by At

I doing some tests using a Python Jupyter Notebook on Visual Code to connect pyspark local session to my localhost PostgreSQL, running as a Docker container.

from pyspark.sql import SparkSession

# create a spark instance
spark = SparkSession.builder \
    .appName("ETL_PostgreSQL") \
    .config("spark.master", "local") \
    .config("spark.jars.packages", "org.postgresql:postgresql:42.5.4") \
    .getOrCreate()

# Source PostgreSQL database connection settings
source_url = "jdbc:postgresql://localhost:5430/chinook"
source_properties = {
    "user": "root",
    "password": "****",
    "driver": "org.postgresql.Driver"
}

table_df = spark.read.jdbc(url=source_url, table="genre", properties=source_properties)
table_df.show()

spark.stop()

I get the following error on the spark.read command: ... Py4JJavaError: An error occurred while calling o1946.jdbc. : java.lang.ClassNotFoundException: org.postgresql.Driver ...

I already cheched the java ("1.8.0_401"), the Windows system variables Java_Home and PythonPath, and the Py4J installation. I also tried different "config("spark.jars", ..) configurations. There is no problem to connect to the db using psycopg2 lib.

Can you please help me on this error? Thank you!

0

There are 0 best solutions below