PySparkTypeError: [CANNOT_INFER_SCHEMA_FOR_TYPE] Can not infer schema for type: `str`. for the spark.createDataFrame(dataframe)

338 Views Asked by Sharath At 12 February 2024 at 15:59

I have a pyspark.pandas.frame.DataFrame. And i need to write it to a Hive_metastore table, but unable to do so. I have a dataframe that looks like this:

df: (df.dtypes)
COL-a      int32
Date       object
COL-b      float64
COL-c      float64
dtype: object

from pyspark.sql.types import StructType, StructField,StringType, IntegerType, FloatType, DateType

my_schema = StructType(\[
StructField("COL-a", IntegerType(), True),
StructField("Date", DateType(), True),
StructField("COL-b", FloatType(), True),
StructField("COL-c", FloatType(), True),

\])

df_spark = spark.createDataFrame(df, schema= my_schema)

# Create a temporary view of the DataFrame

df_spark.createOrReplaceTempView("hello_hi")

# Write the DataFrame to a table in Hive metastore

spark.sql("CREATE OR REPLACE TABLE Hive_metastore.Random_loc_1.Final_res AS SELECT \* FROM hello_hi")

I get the following error:

PySparkTypeError: \[CANNOT_ACCEPT_OBJECT_IN_TYPE\] StructTypecan not accept objectCOL-a in type str.

Why is it happening and how do I resolve this? Kindly assist please. Any help will be appreciated

Tried everything but somehow unable to resolve this my_schema issue. No chatgpt or Databricks assistant could help me. Even the official spark webpage didn't have a good explanation.

Original Q&A

There are 1 best solutions below

ghowkay On 12 February 2024 at 19:43

COL-a is defined as an IntegerType() in your schema, but the error indicates it's being treated as a string (str), it seems like there's a data type inconsistency. This is likely be due to the way the pandas DataFrame df is being converted to a PySpark DataFrame.

Can you try this:

# Ensure correct data types in pandas DataFrame
df['COL-a'] = df['COL-a'].astype('int32')
df['Date'] = pd.to_datetime(df['Date'])  # Assuming 'Date' is in a format pandas can parse
df['COL-b'] = df['COL-b'].astype('float64')
df['COL-c'] = df['COL-c'].astype('float64')

# Let Spark infer the schema
df_spark = spark.createDataFrame(df)

df_spark.createOrReplaceTempView("hello_hi")

spark.sql("CREATE OR REPLACE TABLE Hive_metastore.Random_loc_1.Final_res AS SELECT * FROM hello_hi")

PySparkTypeError: [CANNOT_INFER_SCHEMA_FOR_TYPE] Can not infer schema for type: `str`. for the spark.createDataFrame(dataframe)

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PYSPARK

Related Questions in DATABRICKS

Related Questions in DELTA-LIVE-TABLES

Trending Questions

Popular # Hahtags

Popular Questions