I get the error above when I apply my UDF, which is defined as followed:
import org.apache.spark.sql.functions.typedLit
import org.apache.spark.sql.functions.udf
def method_name(map:Map[String, Array[String]]):String = {
var col_a:Array[String] = map("a")
var col_b:Array[String] = map("b")
...
return "Test_string"
}
//excel is a dataframe with col "a" and col "b"
val col_a = excel.select("a").rdd.map(r => r(0).asInstanceOf[String]).collect()
val col_b = excel.select("b").rdd.map(r => r(0).asInstanceOf[String]).collect()
var new_map: Map[String, Array[String]] = List("a" -> col_a).toMap
new_map += ("b" -> col_b)
val method_name_udf = udf(method_name _)
resultTable = resultTable.withColumn("new_map", typedLit(new_map))
resultTable = resultTable.withColumn("new_col", method_name_udf(col("new_map")))
- I use "rdd.map(r => r(0).asInstanceOf[String]).collect()" to get the column of the dataframe as an Array of Strings
- I define new_map as a map
- I apply withcolumn on my resulttable with the method typedLit, which just appends new_map to all rows in a new column "new_map"
- Lastly I apply the UDF in a new column which refers to the new map.
- In the UDF I just want to get the Array of Strings by using map("a"). This is where the error occurs