Does spark's collect() action, after an orderBy, provide the order preserved list?

574 Views Asked by At

I have a dataframe in spark scala, when I perform a collect operation on the dataframe after an orderBy operation, will the order be preserved in the collected scala list?

val schema = StructType(Array(
                        StructField("language", StringType, true), 
                        StructField("users", IntegerType, true)))

val data = Seq(Row("Java", 20000), 
                 Row("Python", 100000), 
                 Row("Scala", 3000))

val df = spark.createDataFrame(data, schema) 

//Performing an orderBy Operation. This will sort the data based on 
//the number of users in descending orders
val dfSorted = df.orderBy(col("users").desc)

//Now I collect the data, this is where I am not sure if the data will be sorted 
//or not, obviously because this ordering may happen on various executors 
//in the cluster.
val collectedDataList = dfSorted.collect()

I know that the order is preserved in the list in Scala but I am not sure if the collect operation will provide the ordered data.

0

There are 0 best solutions below