Python - Unicode & Character Encodings in Python

46 Views Asked by At

I encounter a lot of encoding and decoding problems so I read a lot about it but I just can't get a grip on it. The logic just confuses me and escapes me so I'm kind of lost. To extend my knowledge and read through: https://realpython.com/python-encodings-guide/#enter-unicode

There is this example which I executed but I get another result as on the website:

import requests

from bs4 import BeautifulSoup

print("résumé".encode("utf-8"))
# b'r\xc3\xa9sum\xc3\xa9'
print("El Niño".encode("utf-8"))
# b'El Ni\xc3\xb1o'

print(b"r\xc3\xa9sum\xc3\xa9".decode("utf-8"))
# 'r�sum�'
print(b"El Ni\xc3\xb1o".decode("utf-8"))
# 'El Ni�o'

As you can see, the decoding gives me the diamond shape characters with a question mark. The correct result should be: 'résumé' 'El Niño' What is happening here? What mistakes do I make?

I use Visual Studio Code. Python version 3.12.2. Beautifulsoup version 4.12.3. I'm on Windows 11. Files encoding is set to: Files encoding setting

0

There are 0 best solutions below