Format float to currency using PySpark and Babel

2.4k Views Asked by At

I'd like to convert a float to a currency using Babel and PySpark

sample data:

amount       currency
2129.9       RON
1700         EUR
1268         GBP
741.2        USD
142.08091153 EUR
4.7E7        USD
0            GBP

I tried:

df = df.withColumn(F.col('amount'), format_currency(F.col('amount'), F.col('currency'),locale='be_BE'))

or

df = df.withColumn(F.col('amount'), format_currency(F.col('amount'), 'EUR',locale='be_BE'))

They both give me an error:
enter image description here

2

There are 2 best solutions below

2
mck On BEST ANSWER

To use Python libraries with Spark dataframes, you need to use an UDF:

from babel.numbers import format_currency
import pyspark.sql.functions as F

format_currency_udf = F.udf(lambda a, c: format_currency(a, c))

df2 = df.withColumn(
    'amount',
    format_currency_udf('amount', 'currency')
)

df2.show()
+----------------+--------+
|          amount|currency|
+----------------+--------+
|     RON2,129.90|     RON|
|       €1,700.00|     EUR|
|       £1,268.00|     GBP|
|       US$741.20|     USD|
|         €142.08|     EUR|
|US$47,000,000.00|     USD|
+----------------+--------+
0
think-maths On

There seems a problem in pre-processing the amount column of your dataframe. From the error it is evident that value after converting to string is not just numeric which it has to be according to this tableand has has some additional characters as well. You can check on this column to find that and remove unnecessary character to fix this. As as example:

>>> import decimal
>>> value = '10.0'
>>> value = decimal.Decimal(str(value))
>>> value
Decimal('10.0')
>>> value = '10.0e'
>>> value = decimal.Decimal(str(value))
Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    value = decimal.Decimal(str(value))
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]   # as '10.0e' is not just numeric