How do I round the numbers in a df column correctly in Python?

97 Views Asked by At

The following is my df: https://www.dropbox.com/s/nbez3esbo8fedmf/aapl.csv?dl=0

date        ticker  open    high    low     close   adjClose
2019-07-08  AAPL    50.2025 50.35   49.6025 50.005  48.516
2019-07-09  AAPL    49.8    50.3775 49.7025 50.31   48.8119
2019-07-10  AAPL    50.4625 50.9325 50.39   50.8075 49.2946
2019-07-11  AAPL    50.8275 51.0975 50.4275 50.4375 48.9356
2019-07-12  AAPL    50.6125 51.0    50.55   50.825  49.3116
2019-07-15  AAPL    51.0225 51.4675 51.0    51.3025 49.7748
2019-07-16  AAPL    51.1475 51.5275 50.875  51.125  49.6026
2019-07-17  AAPL    51.0125 51.2725 50.8175 50.8375 49.3237
2019-07-18  AAPL    51.0    51.47   50.925  51.415  49.884

I'd like to round the close column to 2 decimal places. I tried the following:

df['close'] = round(df['close'], 2)
df.loc[:, 'close'] = df.loc[:, 'close'].round(2)
df.loc[:, 'close'] = df.loc[:, 'close'].apply(lambda x: Decimal(x).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP))
df.loc[:, 'close'] = df.loc[:, 'close'].apply(lambda x: Decimal(x).quantize(Decimal('0.01')))
df.loc[:, 'close'] = np.round(df.loc[:, 'close'], 2)

But the best I can do is this:

date        ticker  open    high    low     close   adjClose
2019-07-08  AAPL    50.2025 50.35   49.6025 50.01   48.516
2019-07-09  AAPL    49.8    50.3775 49.7025 50.31   48.8119
2019-07-10  AAPL    50.4625 50.9325 50.39   50.81   49.2946
2019-07-11  AAPL    50.8275 51.0975 50.4275 50.44   48.9356
2019-07-12  AAPL    50.6125 51.0    50.55   50.83   49.3116
2019-07-15  AAPL    51.0225 51.4675 51.0    51.30   49.7748
2019-07-16  AAPL    51.1475 51.5275 50.875  51.13   49.6026
2019-07-17  AAPL    51.0125 51.2725 50.8175 50.84   49.3237
2019-07-18  AAPL    51.0    51.47   50.925  51.41   49.884

The date 2019-07-18 should be 51.42, but I got 51.41. And depending on which of the five ways I used, some can't even round 2019-07-08 50.005 & 2019-07-12 50.825 appropriately because I got 50 and 50.82 instead of 50.01 and 50.83.

So how can I round it properly?

2

There are 2 best solutions below

3
Andrej Kesely On

You can read the CSV file as string (don't convert to floats) and then use Decimal to convert the numbers appropriately:

import pandas as pd
from decimal import Decimal, localcontext, ROUND_HALF_UP

df = pd.read_csv('aapl.csv', dtype=str)  # <-- read the values from csv as string

cols = ['open', 'high', 'low', 'close', 'adjClose']

with localcontext() as ctx:
    ctx.rounding = ROUND_HALF_UP
    df[cols] = df[cols].apply(lambda row: [f'{Decimal(v):.2f}' for v in row])

interesting_dates = ['2019-07-18', '2019-07-08', '2019-07-12']
print(df[df['date'].isin(interesting_dates)])

Prints:

         date ticker   open   high    low  close adjClose
0  2019-07-08   AAPL  50.20  50.35  49.60  50.01    48.52
4  2019-07-12   AAPL  50.61  51.00  50.55  50.83    49.31
8  2019-07-18   AAPL  51.00  51.47  50.93  51.42    49.88
0
AxelOruse On

The round() function in Python rounds to the nearest even number.

This allows to round both up and down algorithmically in a series of numbers, which avoids skewing the data. Most rounding functions used in Python use similar mechanisms in order to not skew your data upwards.