Cummulative substraction of 2 values using SQL or Python

62 Views Asked by bisk d At 22 February 2024 at 09:49

rake_name	origin	destination	created_At	bpc_issue_date	departure_time	journey_distance	bpc_balance_distance_km	abc	rn	logic
ARIL-05	a	b	09-02-2024 06:37	08-02-2024 18:30	09-02-2024 13:40	173.4600067	5271	5097.539993	1	5271-173.4600067
ARIL-05	a	b	11-02-2024 02:58	08-02-2024 18:30	11-02-2024 04:15	1099.369995	5271	3998	2	5097-1099
ARIL-05	a	b	14-02-2024 12:46	08-02-2024 18:30	14-02-2024 15:50	1099.369995	5271	2898	3	3998-1099
ARIL-05	a	b	16-02-2024 17:29	08-02-2024 18:30	17-02-2024 02:30	175.6600037	5271	2713	4	2898-175
ARIL-05	a	b	17-02-2024 10:54	08-02-2024 18:30	18-02-2024 11:30	1178.880005	5271	1544	5	2713-1178
ARIL-05	a	b	20-02-2024 11:44	08-02-2024 18:30	20-02-2024 15:20	1179.369995	5271	364	6	1544-1179

Please look at the logic column for substraction i want.

so i want to update abc column with difference between bpc_balance_distance_km and journey_distance once i get that value i want to substract below abc values with the result of abc - respective journey_distance.

i have tried sql, however i am not sure how can i populate row >1, am able to populate 1st row using case when rn= 1 then bpc_balance_km - journey_distance

i have tried following query however, i awnt to covert this query int databricks sql supported query. this query works

SET @newabc := 0;

WITH cte AS (
 SELECT *,
       CASE WHEN rn=1 THEN @newabc := bpc_balance_distance_km-journey_distance
         ELSE @newabc := @newabc-journey_distance END AS nabc
 FROM derived_abc ORDER BY rn)
 SELECT *, ROUND(nabc) AS final_result
 FROM cte;

any suggastion in python or sql will be helpful. here is DB fiddle link : https://dbfiddle.uk/JBzaylSb i am using databricks sql

Original Q&A

There are 2 best solutions below

FanoFN On 23 February 2024 at 01:51 BEST ANSWER

Here, try this:

SET @newabc := 0;

WITH cte AS (
 SELECT *,
       CASE WHEN rn=1 THEN @newabc := bpc_balance_distance_km-journey_distance
         ELSE @newabc := @newabc-journey_distance END AS nabc
 FROM derived_abc ORDER BY rn)
 SELECT *, ROUND(nabc) AS final_result
 FROM cte;

In the query above, the value of bpc_balance_distance_km-journey_distance for rn=1 is assigned to the @newabc variable. Then for rows where rn > 1, it'll take the current assigned @newabc value to subtract with the row's journey_distance value.

There'll be a slight different result in which causes by what I already point out in the comment - the inconsistencies of the decimal treatment. You can see the example in this demo fiddle.

bisk d On 23 February 2024 at 08:01

def calculate_nabc(rn, bpc_balance_distance_km, journey_distance, newabc):
    if rn == 1:
        newabc[0] = bpc_balance_distance_km - journey_distance
    else:
        newabc[0] = newabc[0] - journey_distance
    return newabc[0]

def calculate_final_result(df):
    # Add a row number column
    df['rn'] = df.groupby('rake_name').cumcount() + 1
    
    # Initialize the value of newabc
    newabc = [0.0]

    # Calculate the nabc column using the UDF
    df['nabc'] = df.apply(lambda row: calculate_nabc(row['rn'], row['bpc_balance_distance_km'], row['journey_distance'], newabc), axis=1)

    # Round the nabc values
    df['final_result'] = df['nabc'].round()

    return df
df = calculate_final_result(df)
df.display()

Cummulative substraction of 2 values using SQL or Python

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in MYSQL

Related Questions in DATABRICKS

Related Questions in DATABRICKS-SQL

Trending Questions

Popular # Hahtags

Popular Questions