Using the bit64 package, I am trying to create 64-bit integer hashes using xxhash64, similarly to pyspark's pyspark.sql.functions.xxhash64.
With the digest package, I can create the xxhash64 in the form of a string representation of a hex, however I am not able to convert this to an integer64:
str(digest::digest("foo", algo = "xxhash64"))
# chr "26af78b925a80acb"
For some strings, I can use as.numeric, but that has problems with precision and NAs produced by integer64 overflow:
xxHash <- digest::digest("barfoo", algo = "xxhash64")
xxHash
# [1] "8107d13ec8130cad"
bit64::as.integer64(as.numeric(paste0("0x", xxHash)))
# integer64
# [1] <NA>
Is there a way to get xxhash64 values in the integer64 data type?
The as.numeric is double precision and has only 53 bits, so that it will result in rounding errors in the last [0, 4] digits since 2^9 bits is missing where the possible values it can take is 0 to 512, for more details in base precision as.numeric see (https://stat.ethz.ch/R-manual/R-devel/library/base/html/double.html)
Actually it happens also in your first example of "foo", lets use the as.numeric() and bit64::as.integer64 together;
Yet, it is not quite correct due to the 53 bit precision. The correct integer64 representation for that hexadecimal is 2787579430961679051. You can get this number using Rmpfr package as follows;
Same for xxHash;