Going between hex bytes and strings

63 Views Asked by At

I am sure the solution is easy, but I haven't found it online. I might be just not searching as I should.

I want to be able to easily change between a character and its bits and hex representation in python. I usually do it as follows:

If I have a character, say chr(97), the letter "a", I can get its bit representation by

byte_character=chr(97).encode()

Or it's hex rep by

hex_character=chr(97).encode().hex()

and to going back I can use

bytes.fromhex(hex_character).decode()

byte_character.decode()

This works fine for most characters but for some of them the encoding uses more than one character. An example is chr(140) that when encoded gives 2 bytes:

chr(140).encode()

gives

b'\xc2\x8c'

rather than just b'\x8c' as I expect. Can you explain me what I am doing wrong?

2

There are 2 best solutions below

0
pts On BEST ANSWER

If all you need is the 0 .. 255 byte range, you can use the latin1 (ISO-8859-1) encoding:

>>> chr(140).encode('latin1')
b'\x8c'
>>> chr(255).encode('latin1')
b'\xff'
>>> chr(256).encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0100' in position 0: ordinal not in range(256)

Your original attempt was using the UTF-8 encoding by default, which emits multiple bytes for code points above 127:

>>> chr(140).encode()
b'\xc2\x8c'
>>> chr(127).encode()
b'\7f'
>>> chr(128).encode()
b'\xc2\x80'
>>> chr(12345).encode()
b'\xe3\x80\xb9'
0
Theodore Spanbauer On

UTF-8(the default method of encoding unless specified) encodes any Unicode codepoint >128 (0x7F) in two or more bytes, so when the result of chr(x) is >128, it returns two or more bytes.

To fix this there are two solutions.

Use 'latin1' encoding to have it return 1 byte on a 0-255 scale.

x = chr(140)
byte = x.encode('latin1')
print(x)
>>> b'\x8c'

Another solution is Python has a built-in function, to_bytes. Using this method does not require converting the number beforehand. You can also specify the number of bytes you want to be returned in the first argument.

x = 140
byte = x.to_bytes(1, 'little')
print(byte)
>>> b'\x8c'

Both of these should work fine for what you are trying to accomplish.