I have a potgreSQL data bank with more than 10,0000 entries and each entry has a bit array of size 10000. Is there any method to accelerate the Hamming Distance calculation of the bit arrays for the whole table. Thanks
i tried using different data types like bytea, text and numeric for saving bit array and for calculating hamming distance i tried XOR gate operations, text comparison and numeric addition respectively for each datatypes. But i could not optimize the function to make it super quick, currently it takes almost 2 sec for the operation. The target is 200 millisecond.
There is no possibilities to have good performances for hamming distance because it's a recursive process with a high algorithmic complexity and a very high memory footprint. https://www.cs.swarthmore.edu/~brody/papers/random14-hamming-distance.pdf
It is not accurate to use it in some big datasets like RDBMS.
Some other comparing technics exists and have a lower complexity withour recursive process and with a minimal footprint... They are not as accurate as the Hamming Distance, but can do a good job, as the one I wrote :
See "inférence basique"
You can combine the two... First use inférence basique to reduce the set, second use hamming on some very few results...