Spark HiveContext apply IN operation using sql method

65 Views Asked by At

I have an employee hive table with column Name , Department , City and i want to retrieve the data based on names of the employee using IN operation in HiveContext.sql() function but it is throwing pyspark.Analysis Exception.please look at example below.

employee table:

Name   Department  City
Ram     FDE        Mumbai
Ramesh  CTZ        Pune
Suraj   FDE        Chennai
Varun   CTZ        Delhi

Query :

SELECT * from employee WHERE Name in ('Ramesh' , 'Varun')

code snippet from spark program:

namesList= ['Ramesh' , 'Varun']
data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = namesList))

I tried to modify and pass the strings instead of list but the error remains same

Error:pyspark.AnalysisException : structType field

please help me on this and suggest me if i'm doing something wrong here.

2

There are 2 best solutions below

0
matkurek On

You should get rid of square brackets from the python list when creating the query - str(namesList)[1:-1]

data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = str(namesList)[1:-1]))
0
Shubham Jain On

Replace this

data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = namesList))

with this

data = HiveContext.sql("SELECT * from employee WHERE Name in ({namesList})".format(namesList = "'"+"','".join(namesList)+"'"))

You need to pass the string not list.