When using spark dataframe, when the cast( ) function was called as decimal(38,16), however, as a result, it calculated not up to 16 decimal places, but up to 14 characters and filled the rest with zeros
The code of the unit is presented below:
avg(
when((col("a.is_debtor") == 1) & (col("a.is_kk") == 0) & (col("a.is_open") == 0),
datediff(col("a.date_end_plan"), col("a.date_end_fact")) / (365.25 / 12)
)
).cast(DecimalType(38, 20)).alias("loan_avg_plan_fact_closed_mm_cnt")
As a result , the following result is output:
|loan_avg_plan_fact_closed_mm_cnt|
+--------------------------------+
|19.3593429158110900
+--------------------------------+
However, the result should look like this:
|loan_avg_plan_fact_closed_mm_cnt|
+--------------------------------+
|19.3593429158110870
+--------------------------------+
I tried to put CAST in any possible place in the code, but to no avail. Interestingly, if you remove the division by (365.25/ 12), the result will be correct.Please help me, I have been struggling with this error for a week
You've likely already identified the problem, proving it would be difficult without a sample reproducible case.
(365.12/12) is a DoubleType not a DecimalType. Convert it to a Decimal with another cast around it of 38,20 to keep everything Decimal.
If this isn't the solution your best best is to show the actual expression used (sometimes the string df.explain(true) is enough), use a debug on the DF and look through the query plans, this will show where casts are being added. Often you'll see extra casts added around your casts by some of the Spark internal rules on decimal handling (e.g. to prevent overflows etc.). These have also changed as part of 3.4.