We have a PySpark dataframe containing rate codes that we have to use to give discounted offers to our customers.
-ratecode - Actual rate code
-weeklyrate - weekly dollar amount that the customer will pay
-area - area of residence
-frequency -
-offer1 - The first discounted offer to customer
-offer2 - The second discounted offer to customer
The problem is to find the closest "ratecode" corresponding to "offer1" (and save it as "offer1Ratecode") and "offer2" (saving as "offer2Ratecode").
Explanation:
- for the "offer1" = 4.4 , the "offer1Ratecode" is R1, because the closest "weeklyrate" to 4.4 is 5.5 and 5.5 corresponds to "ratecode" R1
- for the "offer1" = 6 , the "offer1Ratecode" is R2, because the closest "weeklyrate" to 6 is 6.2 and 6.2 corresponds to "ratecode" R2

Input:
One way would be using
crossJoinandgroupBy:Another could be using window functions and
transform: