Python 3.8 urllib issues with read

84 Views Asked by At

I'm trying to do a basic GET from a local server on python 3.8 and am receiving no data back where as my old code from python 2.7 worked fine.

Here I tried doing the basic GET

import http.client
import urllib.request
def getData():
    data = ""
    strEPAaddress = '192.168.3.1'
    http.client._MAXHEADERS = 1000
    try:

        handler = urllib.request.HTTPHandler(debuglevel=10)
        opener = urllib.request.build_opener(handler)
        resp = opener.open("http://%s:8080"%(strEPAaddress))
        data = resp.read()
    except ConnectionError as e:
        print("Connection Error: {}".format(e))
    except Exception as e:
        # Handle exceptions appropriately
        print("Error: {}".format(e))

It connects, gives me the OK and 200 but I get an empty byte as the response read with

send: b'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: 192.168.3.1:8080\r\nUser-Agent: Python-urllib/3.8\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: WebSilicon HTTPd
header: Last-modified: Tue, 18 Nov 2007 13:09:42 GMT
header: Cache-Control: no-store
header: Connection: close
[('Server', 'WebSilicon HTTPd'), ('Last-modified', 'Tue, 18 Nov 2007 13:09:42 GMT'), ('Cache-Control', 'no-store'), ('Connection', 'close')]

b''

Where as my code in python which is pretty much identical

import urllib, httplib

def getData():
    data = ""
    strEPAaddress = '192.168.3.1'
    try:
        httplib.HTTPConnection.debuglevel = 1
        resp = urllib.urlopen("http://%s:8080"%(strEPAaddress))
        data = resp.read()

Will give me when printing the response info

send: 'GET / HTTP/1.0\r\nHost: 192.168.3.1:8080\r\nUser-Agent: Python-urllib/1.17\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: WebSilicon HTTPd
header: Last-modified: Tue, 18 Nov 2007 13:09:42 GMT
header: Cache-Control: no-store
header: Connection: close

<TITLE>**********</TITLE>
</HEAD>
<BODY bgcolor=white topmargin=0>
<P align=center><br>
<font face=arial size=6 color=Black><B>****** Status Summary</B></font><br><br>
<TABLE WIDTH=100% BORDERCOLOR=Black ALIGN=center BORDER=1 CELLSPACING=0 CELLPADDING=0>
<TABLE WIDTH=90% BORDERCOLOR=Black ALIGN=center BORDER=1 CELLSPACING=0 CELLPADDING=0>
<TR>
    <TD bgColor="75b4b4" width="90%"><font size=4>&nbsp;<B>System Info</B></font></TD>
</TABLE>
<TABLE WIDTH=90% BORDERCOLOR=Black ALIGN=center BORDER=1 CELLSPACING=0 CELLPADDING=0>
<TR>
    <TD bgColor="9EB7B4"  width="25%">&nbsp;Manufacturer</TD>
-----------

I also tried wireshark and saw that the server responded to both GETs with the full html block

On my end I'm not really sure what I'm missing. Initially it gave me the 100 header issue which is why I added the max header line http.client._MAXHEADERS = 1000

Wireshark Response Headers.

Python 2.7 https://imgur.com/a/ilaFYFD

Python 3.8 https://imgur.com/a/ICNeMu7

Hex data right before the html from both pythons https://imgur.com/a/GSu60Ib

1

There are 1 best solutions below

0
AKX On

To make my comment an actual answer:

Something about the device you're connecting to is acting up with the HTTP/1.1 message that Python 3's urllib sends, and it misbehaves.

For this very simple use case, you can just emulate a basic HTTP/1.0 connection by sending the known-good request that Python 2 sent:

import socket
s = socket.create_connection(("192.168.3.1", 8080))
s.sendall(b'GET / HTTP/1.0\r\nHost: 192.168.3.1:8080\r\nUser-Agent: Python-urllib/1.17\r\n\r\n')

response = b""
while True:
    chunk = s.recv(4096)
    if not chunk:
        break
    response += chunk
s.close()

print(response)  # b"HTTP 200 OK\r\n...

Parsing the response should be easy enough; headers, _, body = response.partition(b"\r\n\r\n") is likely a good start.