I have pyspark dataframe with 4 columns
1) Country
2) col1 [numeric]
3) col2 [numeric]
4) col3 [numeric]
I have udf which takes number and formats it to xx.xx [ 2 decimal points] using "withColumn" function I can call udf and format the numbers.
Example :
df=df.withColumn("col1", num_udf(df.col1))
df=df.withColumn("col2", num_udf(df.col2))
df=df.withColumn("col3", num_udf(df.col3))
What i m looking for can we run this udfs on each col parallelly, instead running in sequence.
Not sure why do you want to run it in parallel, but you can achieve it by using
rddandmap: