Issue with rounding data and NaN when using pandas.read_csv

217 Views Asked by At

I am trying to load a csv into a data frame using pandas.read_csv. My data has a column of cell ids that are 18 digits long, and then other columns with other data. Sometimes there is an empty entry, as shown below:

root_id_orig root_id_final coarse_id
648518346489344345 648518346489344345 local
648518346509145466
648518346489461189 648518346489461189 intersegmental

When I use pandas.read_csv, it reads in the empty spaces as NaNs, which is good, but then it also rounds the 18 digit numbers. I can force it to display all 18 digits, but then it will replace the last two digits seemingly randomly, so that '648518346489344345' becomes '648518346489344308.'

I would like to load in the data and avoid this rounding issue, but still have something like NaN in the empty entries, so that I know to ignore them later. Alternatively, I could just drop the rows with the empty entries, since honestly I do that later anyway. Any advice?

Edit: actual pandas output --

test1 = csv of data with no spaces/missing entries test2 = csv of data with missing entries

pd.set_option('display.float_format', lambda x: '%18.0f' % x)
segIDs1 = pd.read_csv(csv_path+'test1.csv')
segIDs2 = pd.read_csv(csv_path+'test2.csv')
print(segIDs1)
print(segIDs2)

segIDs1 prints as the following:

 root_id_orig       root_id_final       coarse_id
0    648518346492622267  648518346492622267           local
1    648518346490149896  648518346490149896  intersegmental
2    648518346475243320  648518346475243320           local
3    648518346486220960  648518346486220960           local
4    648518346486220960  648518346491547966  intersegmental
..                  ...                 ...             ...
348  648518346494699683  648518346526246871              MN
349  648518346491602705  648518346499802323           local
350  648518346492012120  648518346503946592           local
351  648518346476192927  648518346499062337           local
352  648518346493059320  648518346492999344  intersegmental

[353 rows x 3 columns]

segIDs2 prints as the following:

 root_id_orig      root_id_final       coarse_id
0    648518346492622267 648518346492622208           local
1    648518346490149896 648518346490149888  intersegmental
2    648518346475243320 648518346475243264           local
3    648518346486220960 648518346486220928           local
4    648518346486220960 648518346491547904  intersegmental
..                  ...                ...             ...
529  648518346475585266                NaN             NaN
530  648518346472501734                NaN             NaN
531  648518346471918758                NaN             NaN
532  648518346468216120                NaN             NaN
533  648518346468216120                NaN             NaN

[534 rows x 3 columns]
0

There are 0 best solutions below