hashlib.md5 returns a weird object and indexing is also strange

53 Views Asked by At

So I wanted to get to know how hashlib.md5 works and produced the following code:

import hashlib

a = b'yolo'

h = hashlib.md5(a).digest()
b = h[6:10]

print(h)
print(b)

Don't mind the fact that I used "yolo" as a string. This is just for testing.

Now when running this code, it produces

b'O\xde\xd1FG6\xe7xe\xdf#,\xbc\xb4\xcd\x19'
b'\xe7xe\xdf'

which quite frankly seems to be off. First of all, I expected 4 bytes (bytes 6-9 both included) to come out in the second line and the first part (the\xe7xe) is not even a byte (afaik).

The documentation says that I should get a bytes object from the call to digest(), but for some reason this seems to not be the case(?..). My understanding is that a bytes object is just a list of bytes (and the function should thus produce an output like b'\x0f\xff\x75...' or whatever and never produce an output containing \xe7xe or start with a letter). What am I misunderstanding here?

2

There are 2 best solutions below

0
Brian61354270 On BEST ANSWER

b'\xe7xe\xdf' is a bytes with exactly four bytes:

>>> [hex(b) for b in b'\xe7xe\xdf']
['0xe7', '0x78', '0x65', '0xdf']

It just-so-happens that two of those bytes fall in the ASCII printable range, so they're represented as characters instead of \x## sequences.

  • 0x78 is x
  • 0x65 is e

For confirmation, you can compare b'\xe7\x78\x65\xdf' with b'\xe7xe\xdf'. They're two different representations of the exact same bytes:

>>> b'\xe7\x78\x65\xdf' == b'\xe7xe\xdf'
True

For a more consistent human-readable representation, you can convert the bytes object to a hex string using it's hex method:

>>> b'\xe7xe\xdf'.hex(' ')
'e7 78 65 df'

Or you can retrieve a hexstring from the get-go by using hash.hexdigest instead of hash.digest.

0
Daweo On

If you prefer to see just hex codes in representation then use .hexdigest() which will give you string

import hashlib
h = hashlib.md5(b'yolo').hexdigest()
print(h)  # 4fded1464736e77865df232cbcb4cd19

Keep in mind that now every byte is representing using 2 characters, so for example to get 3 first bytes you should do

print(h[:6])  # 4fded1