xlsx - columns getting mapped incorrectly while reading into spark dataframe using pyspark

59 Views Asked by pythonUser At 20 March 2024 at 06:54

I'm trying to read excel file using below pyspark code

df_data = spark.read.format("com.crealytics.spark.excel") \
            .option("header", "true") \
            .option("dataAddress", f"'{sheet_name}'!A1") \
            .option("treatEmptyValuesAsNulls", "false")\
            .schema(custom_schema) \
            .load(file_path)

Mapping of the column names are not in correct order as per the file. for example

file:
col1 col2 col3
12    23   null

Df output: 
  col2 col3 col1
  null 12    23

Let me know how this can be fixed in sorting correcting column mapping. Thanks in Advance.

Original Q&A

There are 1 best solutions below

DileeprajnarayanThumula On 20 March 2024 at 10:08

I have tried the below approach:

from pyspark.sql.types import StringType, StructField, StructType
file_path = "/FileStore/tables/exclk.xlsx"
sheet_name = "Sheet1" 
schema = StructType([
    StructField("col1", StringType(), nullable=True),
    StructField("col2", StringType(), nullable=True),
    StructField("col3", StringType(), nullable=True)
])
desired_order = ['col1', 'col2', 'col3']
df_data = spark.read.format("com.crealytics.spark.excel") \
            .option("header", "true") \
            .option("dataAddress", f"'{sheet_name}'!A1") \
            .option("treatEmptyValuesAsNulls", "false") \
            .schema(schema) \
            .load(file_path)
df_data = df_data.select(desired_order)
df_data.show()

Results:

+----+----+----+
|col1|col2|col3|
+----+----+----+
|  12|  23|NULL|
|  34|  45|  56|
+----+----+----+

In the above code reading the Excel file, applied the specified schema, and select the columns in the desired order.

xlsx - columns getting mapped incorrectly while reading into spark dataframe using pyspark

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in AZURE-DATABRICKS

Related Questions in XLSX

Trending Questions

Popular # Hahtags

Popular Questions