I am trying to get the data from MongoDB in parallel and store all dataframes, view names in a collection so that I can refer them back.
For this, I created a collection where I am trying to store dataframes and views. I am getting error appending element to a collection. I tried using Vector, List, Seq. But nothing seems to be working for me.
Is there a way to handle such problems?
var mongoFrames = Nil
for(c <- collections) {
var connectionString = connectionInt.setCollection(c);
var dframe = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", connectionString).load()
var view = dframe.createOrReplaceTempView(c);
var mongoQuery = s"select * from $c where tuid in (${tuidIn.mkString(",")})";
var tup = (c, dframe, view, mongoQuery)
mongoFrames += tup
}
for(v <- mongoFrames) yield spark.sql(v._4).collect() // load data from source into spark
Update
When trying to use +:, I am getting following error
error: value +: is not a member of (String, org.apache.spark.sql.DataFrame, Unit, String) mongoFrames +: tup
You can write it as:
and
then
iterate over it
Edit 1:
a more idiomatic way of iterating over the collection in this case is to write:
using the anonymous function.
This is short for: