I use Visual Studio Code. Python version 3.12.2. Beautifulsoup version 4.12.3. I'm on Windows 11. Files encoding is set to: utf-8.
This is my code sample in VS code:
import requests
import urllib.parse
from urllib.parse import quote
from bs4 import BeautifulSoup
for topic in range(13717, 13718):
url = 'https://www.scale-rc-car.com/forum/showthread.php?t='+str(topic) +'&pp=1&page=1'
print(url)
html_content = requests.get(url)
soup = BeautifulSoup(html_content.text, 'html.parser')
print(url) results in the constructed url with the correct topic number (13717):
https://www.scale-rc-car.com/forum/showthread.php?t=13717&pp=1&page=1
and that is correct and what I want.
But here's the rub, I get the often posted "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 64: invalid continuation byte"
The thing is, as soon as the html_content = requests.get(url) statement is executed the url seem to change to:
https://www.scale-rc-car.com/forum/showthread.php?13717-Buggy-d-%E9tag%E8re-Team-Associated-RC10CC&pp=20
I can check that by pasting the constructed url (https://www.scale-rc-car.com/forum/showthread.php?t=13717&pp=1&page=1) in the webbrowser and when I hit ENTER it changes and adds the phrase: -Buggy-d-%E9tag%E8re-Team-Associated-RC10CC
As you can see the characters é and è are replaced by respectively %E9 and %E8. And the result is the errormessage UnicodeDecodeError. The question is:
How can I avoid or error-trap this problem?
Extra info, I don't no on forehand if there will be special characters in the url.
This is the complete error message:
PS C:\xampp\htdocs\python> python dumpy.py
https://www.scale-rc-car.com/forum/showthread.php?t=13717&pp=1&page=1
Traceback (most recent call last):
File "C:\xampp\htdocs\python\dumpy.py", line 10, in <module>
html_content = requests.get(url)
^^^^^^^^^^^^^^^^^
File "C:\Users\bartz\AppData\Roaming\Python\Python312\site-packages\requests\api.py", line 73, in get
return request("get", url, params=params, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bartz\AppData\Roaming\Python\Python312\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bartz\AppData\Roaming\Python\Python312\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bartz\AppData\Roaming\Python\Python312\site-packages\requests\sessions.py", line 725, in send
history = [resp for resp in gen]
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bartz\AppData\Roaming\Python\Python312\site-packages\requests\sessions.py", line 175, in resolve_redirects
url = self.get_redirect_target(resp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bartz\AppData\Roaming\Python\Python312\site-packages\requests\sessions.py", line 124, in get_redirect_target
return to_native_string(location, "utf8")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\bartz\AppData\Roaming\Python\Python312\site-packages\requests\_internal_utils.py", line 33, in to_native_string
out = string.decode(encoding)
^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 64: invalid continuation byte
PS C:\xampp\htdocs\python>
Get redirected URL using urllib.request — Extensible library for opening URLs, see
final_urlbelow:All
prints merely for debugging purposes.Output:
.\SO\78094322.py