How to replace null value with some value using coalesce in pyspark

270 Views Asked by At

I have two files :- orders_renamed.csv , customers.csv I am joining them with full outer join and then dropping same column (customer_id). I want to replace null vaue to "-1" in "order_id" column.

I have tried this:

from pyspark.sql.functions import regexp_extract, monotonically_increasing_id, unix_timestamp, from_unixtime, coalesce from pyspark.sql.types import IntegerType, StructField, StructType, StringType

ordersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/orders_renamed.csv").load()

customersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/customers.csv").load()

joinCondition1 = ordersDf.customer_id == customersDf.customer_id

joinType1 = "outer"   


joinenullreplace = ordersDf.join(customersDf, joinCondition1, joinType1).drop(ordersDf.customer_id).select("order_id", "customer_id", "customer_fname").sort("order_id").withColumn("order_id",coalesce("order_id",-1))


joinenullreplace.show(50) 

as in last line i have used coalesce but it is giving me error..i have tried multiple ways like treting coalesce as one expression and applying 'expr' but it did not work. I have also used lit but it did not work. please reply solution.

1

There are 1 best solutions below

1
mpr On
from pyspark.sql.functions import lit