How to prevent pyspark to read parquet file header record as just another row instead of reading it as header?

548 Views Asked by moonchild At 02 May 2023 at 15:57

I have a parquet file with 11 columns. I tried executing below ways in pyspark to read the file. It still assigns header names like Prop_0, Prop_1, Prop_2 instead of reading the starting header as header row.

spark.read.parquet("/FileStore/tables/Order.parquet").show()

dfpq_new=spark.read.format("parquet").load("/FileStore/tables/Order-1.parquet")

dfpq_new=spark.read.format("parquet").option("header", True).option("inferSchema", True).load("/FileStore/tables/Order-1.parquet")

headers prop_0 prop_1 instead of header names from parquet file

However, when I create an dataframe and save it as parquet file, and then read it -

data1 = (("Bob", "IT", 4500), \
("Maria", "IT", 4600),  \
("James", "IT", 3850),   \
("Maria", "HR", 4500),  \
("James", "IT", 4500),    \
("Sam", "HR", 3300),  \
("Jen", "HR", 3900),    \
("Jeff", "Marketing", 4500), \
("Anand", "Marketing", 2000),\
("Shaid", "IT", 3850) \
)
col = ["Name", "MBA_Stream", "SEM_MARKS"]
marks_pq_df = spark.createDataFrame(data1, col)
marks_pq_df.write.parquet("/FileStore/table/markspq.parquet", mode='overwrite')

spark.read.format("parquet").load("/FileStore/table/markspq.parquet").show()

reads_headers_from_parquet_file

I am using databricks community edition.

Original Q&A

How to prevent pyspark to read parquet file header record as just another row instead of reading it as header?

There are 0 best solutions below

Related Questions in PYSPARK

Related Questions in PARQUET

Related Questions in DATABRICKS-COMMUNITY-EDITION

Trending Questions

Popular # Hahtags

Popular Questions