Error in getSparkSession() : SparkSession not initialized

115 Views Asked by At

I ran a single line of code to make a Spark DataFrame. I installed SparkR and used library(SparkR) before I ran the following:

spark_df <- as.DataFrame(data)

However I get the following error message

Error in getSparkSession() : SparkSession not initialized

What do I need to do?

2

There are 2 best solutions below

2
CRAFTY DBA On

This is for Databricks using R notebooks. Make sure you load the library. This loads the diamonds dataset which if famous for linear regression.

Refer to docs for details. https://docs.databricks.com/sparkr/overview.html

library(SparkR)
diamondsDF <- read.df("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", source = "csv", header="true", inferSchema = "true")
head(diamondsDF)

There is code to create a spark data frame also.

library(SparkR)
df <- createDataFrame(faithful)

# Displays the content of the DataFrame to stdout
head(df)
2
CRAFTY DBA On

It must be your environment.

Here is the output from the second block of code.

enter image description here

Here is the output from the first block of code.

enter image description here

Another syntax that works fine.

enter image description here

Are you using R-Studio and spark is not local? Read the docs on how to connect. Databricks does this automatically for you!

https://spark.apache.org/docs/latest/sparkr.html