Selecting a column with backtick in its name - AnalysisException: cannot resolve Column

186 Views Asked by At

I have a data frame which has the below column:

Last Login- Date & Time(Incl. Time Zone)

When I read the data and print the schema, the column gets printed
df.printSchema()

enter image description here

But when I try selecting the column from the data frame it fails.

df.select(col("Last Login- Date & Time(Incl. Time Zone)"))

AnalysisException: cannot resolve '`Last Login- Date & Time(Incl. Time > Zone)`' 
given input columns: [`Last Login- Date & Time(Incl. Time Zone)`]
2

There are 2 best solutions below

1
notNull On

Try by replacing backquotes(`) with _.

Example:

from pyspark.sql.functions import *
df = spark.createDataFrame([('1',)],['`Last Login- Date & Time(Incl. Time Zone)`'])
df = df.toDF(*(c.replace('`', '_') for c in df.columns))
df.selectExpr("`_Last Login- Date & Time(Incl. Time Zone)_`").show()
#+------------------------------------------+
#|_Last Login- Date & Time(Incl. Time Zone)_|
#+------------------------------------------+
#|                                         1|
#+------------------------------------------+
6
ZygD On

As can be seen in the screenshot, your column name is surrounded with backticks `. If this is not intentional, you may want to remove the backticks. On the other hand, when selecting the column, you should use triple backticks for every backtick in the column name:

from pyspark.sql import functions as F
df = spark.range(1).toDF('`Last Login- Date & Time(Incl. Time Zone)`')
df.printSchema()
# root
#  |-- `Last Login- Date & Time(Incl. Time Zone)`: long (nullable = false)

df.select(F.col("```Last Login- Date & Time(Incl. Time Zone)```")).show()
# +------------------------------------------+
# |`Last Login- Date & Time(Incl. Time Zone)`|
# +------------------------------------------+
# |                                         0|
# +------------------------------------------+