I am new to python and trying to create a bloomFilter based on Bit torrent BEP 33. I have created Bloom Filter but it is not exactly what I am looking for. Here's what I need and I have not completely understood this situation. If someone here can explain ...
//fixed parameters
k = 2
m = 256*8
//the filter
byte[m/8] bloom ## What is this part?
function insertIP(byte[] ip) {
byte[20] hash = sha1(ip)
int index1 = hash[0] | hash[1] << 8
int index2 = hash[2] | hash[3] << 8
// truncate index to m (11 bits required)
index1 %= m ## ?
index2 %= m ## ?
// set bits at index1 and index2
bloom[index1 / 8] |= 0x01 << index1 % 8 ## ??
bloom[index2 / 8] |= 0x01 << index2 % 8 ## ??
}
// insert IP 192.168.1.1 into the filter:
insertIP(byte[4] {192,168,1,1})
And this is what I have created
import hashlib
m = 2048
def hashes(s):
index = [0, 0]
#for c in s:
#o = ord(c)
index[0] = hashlib.sha224(index[0]).hexdigest ## This needs integer hash
index[1] = hashlib.sha224(index[1]).hexdigest ## same as above
return [x % m for x in index]
class BloomFilter(object):
def __init__(self):
self.bitarray = [0] * m
def add(self, s):
for x in hashes(s):
self.bitarray[x] = 1
#print self.bitarray
def query(self, s):
return all(self.bitarray[x] == 1 for x in hashes(s))
shazib=BloomFilter()
shazib.add('192.168.0.1')
print shazib.query('192.168.0.1')
First, an explanation of the code...
This is the most baffling line to me;
kisn't used at all...This is the number of bits in 256 bytes.
bloomis an array of 256 bytes, i.e. 256 * 8 bits, i.e.mbits. Each bit inbloomwill contain information about what values are in the filter.This creates a 20-byte hash of
ip.These two lines calculate two indices into
bloombased on the hash. Basically,index1is the concatenation of the first two bytes ofhash, andindex2is the concatenation of the second two bytes ofhash.These two lines truncate the values so that they don't exceed the range of possible indices into
bloom. The%is the mod operator; it returns the remainder after division. (17 % 4 = 1, 22 % 5 = 2 and so on.) Remember that bloom is 256 * 8 bits long? Eleven bits allows us to encode 2 ** 11 possible indices, i.e. 2048 values, i.e. 256 * 8 values.We're treating
bloomas a bit-array, so we have to do some bit-twiddling to access the correct bit. First, divideindexAby 8, to get the correct byte, then truncateindexAusing the%operator to get the correct bit within that byte.And voila, we have a bloom filter. If you printed it out bitwise, it would look like this:
And if a particular i.p., when hashed, generates an
index1of5and anindex2of9, then it would be considered "in" the filter, because the bits at indices5and9are set to1. Of course, there can be false positives, because multiple different values could result in the same indices; but there can be no false negatives.Here's your first problem.
index[0]andindex[1]need to be integers. Also,hashlib.sha224(index[0]).hexdigestreturns a method. You have to call the method to get anything out of it, like this:hashlib.sha224(index[0]).hexdigest(). Also, if you want this to work in a way that's identical to the above code, you could convert the hash to an int (you can useint(x, 16)to convert a hexadecimal string into an integer) and then extract the first two bytes using& 65535, then shift it by two bytes using>> 16, then extract those two bytes using& 65535again. Once you've got that correct, the rest works.