I am implementing a hash set in C and am using linear probing to deal with collisions. My question is, what happens when you have something like this:
a set with capacity of 32
value
ahas hash 14, inserted at idx 14value
bhas hash 14, colliding witha, so it is inserted at idx 15value
ais deleted, leaving idx 14 emptyvalue
c, which is equal to valueb, has hash 14 and can be inserted at idx 14 since it is empty from the deletion ofa
As you can see, value c should not be inserted into the set because it is equal to value b, but the slot in the table is empty, so my program thinks it can be inserted because of the lack of searching for b after the hash index.
My first thought is to linearly search the entire table to ensure that there are no duplicate values before inserting, but that removes the O(1) insertion, deletion, and lookup time for a set.
My second thought is to have a structure like this at each of the slots in the table:
typedef struct {
bool has_value;
bool has_collision
void *value;
} set_entry;
But, not only does it require more memory, the state of has_collision would not really be able to tell the truth, since values can be deleted from the set.
Is there a more efficient way to do this?
To allow elements to be deleted from the hash set, you must be able to distinguish between empty slots that have never been used and freed slots that have been made available when their value has been deleted.
To probe if a key is present in the hash table, you must iterate over used and freed slots, only stopping on empty slots.
When you insert a key into the hash set, first test if it is present, and if not, use the first available (empty or freed) slot from its hash value.
When deleting a key from the hash set, set its entry to empty if the next entry is empty and to freed otherwise. If you can set the entry as empty, you can also set all previous consecutive freed entries as empty.
You do not need extra boolean fields to keep track of these states, if the
valuemember is avoid *, you can use 2 unused pointers to represent empty and freed slots, for example the address of the hash table and that of the second element, orNULLand another invalid pointer if thevaluepointer is required to be a valid pointer.