I'm attempting to download the following file using Python: Dallas DCAD 2024 Appraisals
The download works in my browser, but when I try to do it in Python I'm redirected to an Error page. The response content is the HTML of Errors.aspx instead of the zip binary data.
Here is what I've tried:
import requests
url = 'https://www.dallascad.org/ViewPDFs.aspx?type=3&id=\\DCAD.ORG\WEB\WEBDATA\WEBFORMS\DATA%20PRODUCTS\DCAD2024_CURRENT.ZIP'
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
r = requests.get(url, allow_redirects=True, headers=headers, timeout=None)
print(f"URL: {r.url}")
print(f"Status Code: {r.status_code}")
for i,h in enumerate(r.history):
print(f"History[{i}] URL: {h.url}")
print(f"History[{i}] Status: {h.status_code}")
print(f"History[{i}] Headers: {h.headers}")
Output:
URL: https://www.dallascad.org/Errors/ErrorPage.aspx?aspxerrorpath=/ViewPDFs.aspx
Status Code: 200
History[0] URL: https://www.dallascad.org/ViewPDFs.aspx?type=3&id=%5CDCAD.ORG%5CWEB%5CWEBDATA%5CWEBFORMS%5CDATA%20PRODUCTS%5CDCAD2024_CURRENT.ZIP
History[0] Status: 302
History[0] Headers: {'Cache-Control': 'private', 'Content-Type': 'text/html; charset=utf-8', 'Location': '/Errors/ErrorPage.aspx?aspxerrorpath=/ViewPDFs.aspx', 'Server': 'Microsoft-IIS/8.5', 'Content-Disposition': 'attachment;filename=DCAD2024_CURRENT.ZIP', 'X-AspNet-Version': '4.0.30319', 'X-Powered-By': 'ASP.NET', 'Date': 'Tue, 26 Mar 2024 14:35:36 GMT', 'Content-Length': '168'}
The id parameter contains significant backslashes. Therefore you need to change the URL into a raw string.
The site does not require any headers.
Therefore: