spark dataframe outputs the result incorrectly when using cast

61 Views Asked by Егор Бугаков At 30 August 2023 at 09:14

When using spark dataframe, when the cast( ) function was called as decimal(38,16), however, as a result, it calculated not up to 16 decimal places, but up to 14 characters and filled the rest with zeros

The code of the unit is presented below:

  avg(
        when((col("a.is_debtor") == 1) & (col("a.is_kk") == 0) & (col("a.is_open") == 0),
             datediff(col("a.date_end_plan"), col("a.date_end_fact")) / (365.25 / 12)
        )
    ).cast(DecimalType(38, 20)).alias("loan_avg_plan_fact_closed_mm_cnt")

As a result , the following result is output:

|loan_avg_plan_fact_closed_mm_cnt|
+--------------------------------+
|19.3593429158110900        
+--------------------------------+

However, the result should look like this:

|loan_avg_plan_fact_closed_mm_cnt|
+--------------------------------+
|19.3593429158110870      
+--------------------------------+

I tried to put CAST in any possible place in the code, but to no avail. Interestingly, if you remove the division by (365.25/ 12), the result will be correct.Please help me, I have been struggling with this error for a week

Original Q&A

There are 1 best solutions below

Chris On 30 August 2023 at 12:28

You've likely already identified the problem, proving it would be difficult without a sample reproducible case.

(365.12/12) is a DoubleType not a DecimalType. Convert it to a Decimal with another cast around it of 38,20 to keep everything Decimal.

If this isn't the solution your best best is to show the actual expression used (sometimes the string df.explain(true) is enough), use a debug on the DF and look through the query plans, this will show where casts are being added. Often you'll see extra casts added around your casts by some of the Spark internal rules on decimal handling (e.g. to prevent overflows etc.). These have also changed as part of 3.4.

spark dataframe outputs the result incorrectly when using cast

There are 1 best solutions below

Related Questions in SQL

Related Questions in APACHE-SPARK

Related Questions in AMAZON-S3

Related Questions in APACHE-SPARK-DATASET

Trending Questions

Popular # Hahtags

Popular Questions