The aim here is to read a csv table, and the file has a direct URL. I want to use fread (data.table package) because it is faster with read.csv, but I have a little problem.
options(scipen=999)
caracteristiques=read.csv(url("https://www.data.gouv.fr/s/resources/base-de-donnees-accidents-corporels-de-la-circulation/20160909-181230/caracteristiques_2015.csv"))
caracteristiques[1,1]
# 201500000001
I have to problem to get the [1,1] element.
Now I use fread:
library(data.table)
caracteristiques=data.table(fread("https://www.data.gouv.fr/s/resources/base-de-donnees-accidents-corporels-de-la-circulation/20160909-181230/caracteristiques_2015.csv",
sep=","))
caracteristiques[1,1]
#
Then we can see a with strange number. I have to specify options(scipen=0)
to show it 9.955423e-313
I am wondering if I have to specify some options in fread, since they are large numbers in the first column.
fread
automatically assumed the first column's class to be integer64. From its help file:The values in the first column are: 201500000001, 201500000002, etc. If you treat them as numbers, they are larger than 2^31 (i.e. 2147483648). Thus
fread
interpreted them asinteger64
values, & caused them to look really strange.data.table will automatically load the
bit64
package for you in this situation so that the numbers display properly. However, when you don't havebit64
installed, as you likely don't, it is supposed to warn you and ask you to install it. That lack of warning is bug fix 5 in the development version v1.10.5. From NEWS :So, just
install.packages("bit64")
and you're good. You don't need to reload the data. It just affects how those columns are printed.Alternatively, if you add the argument
integer64 = "numeric"
to yourfread
function, the result will match what you got fromread.csv
. But if it's an ID column, conceptually it should be a character or factor, rather than integer. You can use the argumentcolClasses=c("Num_Acc"="character")
for that.