So I have a variable data
which is a RDD[Array[String]]
. I want to iterate over it and compare adjacent elements. To do this I must create a dataset from the RDD.
I try the following, sc
is my SparkContext
:
import org.apache.spark.sql.SQLContext
val sqc = new SQLContext(sc)
val lines = sqc.createDataset(data)
And I get the two following errors:
Error:(12, 34) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._ Support for serializing other types will be added in future releases. val lines = sqc.createDataset(data)
Error:(12, 34) not enough arguments for method createDataset: (implicit evidence$4: org.apache.spark.sql.Encoder[Array[String]])org.apache.spark.sql.Dataset[Array[String]]. Unspecified value parameter evidence$4. val lines = sqc.createDataset(data)
Sure, I understand I need to pass an Encoder argument, however, what would it be in this case and how do I import Encoders? When I try myself it says that createDataset
does not take that as argument.
There are similar questions, but they do not answer how to use the encoder argument. If my RDD is a RDD[String]
it works perfectly fine, however in this case it is RDD[Array[String]]
.
All of the comments in the question are trying to tell you the following things
You say you have
RDD[Array[String]]
, which I create by doing the followingNow converting the
rdd
todataframe
is to call.toDF
but before that you need toimport
implicits._
ofsqlContext
as belowYou should have
dataframe
asIsn't this all simple?