I'm trying to read excel file using below pyspark code
df_data = spark.read.format("com.crealytics.spark.excel") \
.option("header", "true") \
.option("dataAddress", f"'{sheet_name}'!A1") \
.option("treatEmptyValuesAsNulls", "false")\
.schema(custom_schema) \
.load(file_path)
Mapping of the column names are not in correct order as per the file. for example
file:
col1 col2 col3
12 23 null
Df output:
col2 col3 col1
null 12 23
Let me know how this can be fixed in sorting correcting column mapping. Thanks in Advance.
I have tried the below approach:
Results:
In the above code reading the Excel file, applied the specified schema, and select the columns in the desired order.