I have a pyspark dataframe with below data
[
My code:
W = Window.partitionBy("A").orderBy(col("C"))
main_df = main_df.withColumn("cnt", F.count("B").over(W))
Is there something wrong in how I have used the count function? What can I do so the values in column 'Actual' match with 'Expecting'? I see two issues with my output -
- the count starts at 1 when it should start from 0
- for each group the last count is getting assigned instead of the first
Can anyone help me resolve this issue?
Try with
dense_rankwindow function andsubstract 1from thedense_rankvalue.Example: