I have a dataframe in spark scala, when I perform a collect operation on the dataframe after an orderBy operation, will the order be preserved in the collected scala list?
val schema = StructType(Array(
StructField("language", StringType, true),
StructField("users", IntegerType, true)))
val data = Seq(Row("Java", 20000),
Row("Python", 100000),
Row("Scala", 3000))
val df = spark.createDataFrame(data, schema)
//Performing an orderBy Operation. This will sort the data based on
//the number of users in descending orders
val dfSorted = df.orderBy(col("users").desc)
//Now I collect the data, this is where I am not sure if the data will be sorted
//or not, obviously because this ordering may happen on various executors
//in the cluster.
val collectedDataList = dfSorted.collect()
I know that the order is preserved in the list in Scala but I am not sure if the collect operation will provide the ordered data.