I have small question and issue which I hope spark gurus can help me in
I have parquet file person.parquet that has multiple column with one row. one of the column "Middle Name" has space in the column name which cause issue with spark when writing it to parquet format
what i have done is to rename the column to remove the space as below
SourceData = SourceData.withColumnRenamed("Middle Name","MiddleName")
if i tried to write SourceData to parquet file, it still returns error
Caused by: org.apache.spark.sql.AnalysisException: Attribute name "Middle Name" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.
so i use below which solve the issue
SourceData = spark.read.schema(SourceData.schema).parquet(TestingPath)
but unfortunately the file generated has null value for column MiddleName.
Any suggestion on how to solve this issue?
Try to quote the column name with a pair of backticks (`).