I get the error above when I apply my UDF, which is defined as followed:

import org.apache.spark.sql.functions.typedLit
import org.apache.spark.sql.functions.udf

def method_name(map:Map[String, Array[String]]):String = { 
    var col_a:Array[String] = map("a")
    var col_b:Array[String] = map("b")
    ...
    return "Test_string"
}

//excel is a dataframe with col "a" and col "b"
val col_a = excel.select("a").rdd.map(r => r(0).asInstanceOf[String]).collect()
val col_b = excel.select("b").rdd.map(r => r(0).asInstanceOf[String]).collect()
var new_map: Map[String, Array[String]] =  List("a" -> col_a).toMap
new_map += ("b" -> col_b)

val method_name_udf = udf(method_name _)
resultTable = resultTable.withColumn("new_map", typedLit(new_map))
resultTable = resultTable.withColumn("new_col", method_name_udf(col("new_map"))) 

  • I use "rdd.map(r => r(0).asInstanceOf[String]).collect()" to get the column of the dataframe as an Array of Strings
  • I define new_map as a map
  • I apply withcolumn on my resulttable with the method typedLit, which just appends new_map to all rows in a new column "new_map"
  • Lastly I apply the UDF in a new column which refers to the new map.
  • In the UDF I just want to get the Array of Strings by using map("a"). This is where the error occurs
0

There are 0 best solutions below