I'm reading data from an excel spreadsheet with read_csv2 (the data is using ; as a separator).
There's only two columns, let's call them product and count. count is always an integer or NA.
My problem is that the function seems to be adding '.000' to integer values <1000, for example 15 becomes 15.000, and is messing with the sum function.
Here's how I'm reading my data:
data = read_csv2('data.csv', col_types = c('c','n'), col_names = c('product','value'), skip = 1)%>% arrange(product)
Sample of the data:
product ; count
product1 ; 1,085
product2 ; 205
product3 ; 770
product4 ; 25
product5 ; 50
product6 ;
product7 ;
product8 ; 3,382
product9 ; 1,152
product10 ;
product11 ;
product12 ; 140
in this section, this problem occurs with products 2-5 and 12.
This seems to be harmless when assessing individual values, lets say for example that the 15th row contains one of these values.
it shows up on the data frame as 15.000 however data$value[[15]] returns 15, and adding this value with another row which doesn't have this problem works just fine, for example if the 16th row shows up correctly as 5674, data$value[[15]]+data$value[[16]] returns 5689.
However, when I use the sum() function the extra '.000's seem to matter:
sum(data$value) returns around 7 million when it should be around 140k.
I've tried to change col_type from 'n' to 'i', doesn't seem to matter, and
data$value = round(data$value) also does nothing.
If you read
?read_csv2, it says thatSimilar to the base equivalent functions (
read.csvandread.csv2), thereadrfunctionsread_csvandread_csvrequire you to use the,/.and;/,options for separators and decimal points.If you want to change that behavior, you need to use
read.tableorreadr::read_delim.Demonstration:
And with your updated data,