How can function to find Hamming Distance be accelerated for bigger datas in postgreSQL?

99 Views Asked by At

I have a potgreSQL data bank with more than 10,0000 entries and each entry has a bit array of size 10000. Is there any method to accelerate the Hamming Distance calculation of the bit arrays for the whole table. Thanks

i tried using different data types like bytea, text and numeric for saving bit array and for calculating hamming distance i tried XOR gate operations, text comparison and numeric addition respectively for each datatypes. But i could not optimize the function to make it super quick, currently it takes almost 2 sec for the operation. The target is 200 millisecond.

1

There are 1 best solutions below

0
SQLpro On

There is no possibilities to have good performances for hamming distance because it's a recursive process with a high algorithmic complexity and a very high memory footprint. https://www.cs.swarthmore.edu/~brody/papers/random14-hamming-distance.pdf

It is not accurate to use it in some big datasets like RDBMS.

Some other comparing technics exists and have a lower complexity withour recursive process and with a minimal footprint... They are not as accurate as the Hamming Distance, but can do a good job, as the one I wrote :

See "inférence basique"

You can combine the two... First use inférence basique to reduce the set, second use hamming on some very few results...