I am trying to find out what the internal load factor is for the Python sets. For dictionary which uses a hash table with a load factor of 0.66 (2/3) is. The number of buckets start at 8 and when the 6th key is inserted the number of buckets increases to 16 The table below shows the shift in buckets.
| bucket | shift |
|---|---|
| 8 | 5 |
| 16 | 10 |
| 32 | 21 |
| 64 | 42 |
| 128 | 85 |
This can be seen with de following Python code where the size of a dictionary and sets is shows with the getsizeof method:
import sys
d = {}
s = set()
for x in range(25):
d[x] = 1
s.add(x)
print(len(d), sys.getsizeof(d), sys.getsizeof(s))
| # of elements | memory used for dict | memory used for sets |
|---|---|---|
| 1 | 232 | 216 |
| 2 | 232 | 216 |
| 3 | 232 | 216 |
| 4 | 232 | 216 |
| 5 | 232 | 728 |
| 6 | 360 | 728 |
| 7 | 360 | 728 |
| 8 | 360 | 728 |
| 9 | 360 | 728 |
| 10 | 360 | 728 |
| 11 | 640 | 728 |
| 12 | 640 | 728 |
| 13 | 640 | 728 |
| 14 | 640 | 728 |
| 15 | 640 | 728 |
| 16 | 640 | 728 |
| 17 | 640 | 728 |
| 18 | 640 | 728 |
| 19 | 640 | 2264 |
| 20 | 640 | 2264 |
| 21 | 640 | 2264 |
| 22 | 1176 | 2264 |
| 23 | 1176 | 2264 |
| 24 | 1176 | 2264 |
| 25 | 1176 | 2264 |
The above table shows that the shift in the buckets correct is for the dictionary, but not for the sets. The memory in sets is different.
I am trying to find out what the load factor is for a set. Is that also 2/3? Or am I doing something wrong with the code?
Currently, it's about 3/5. See the source:
fillis the number of occupied table cells (including "deleted entry" markers), andmaskis 1 less than the total table capacity.