I'm trying to implement the logic from the piece of code below which does requests to google search using aiohttp, my solution seems to be equivalent but for some reason does not set cookies as desired. Any help?
from http.cookiejar import LWPCookieJar
from urllib.request import Request, urlopen
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie'))
cookie_jar.load()
def get_page(url, user_agent=None, verify_ssl=True):
if user_agent is None:
user_agent = USER_AGENT
request = Request(url)
request.add_header('User-Agent', user_agent)
cookie_jar.add_cookie_header(request)
response = urlopen(request)
cookie_jar.extract_cookies(response, request)
html = response.read()
response.close()
try:
cookie_jar.save()
except Exception:
pass
return html
My solution:
import aiohttp
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
abs_cookie_jar = aiohttp.CookieJar()
abs_cookie_jar.load('.aiogoogle-cookie')
async def get_page(url, user_agent=None, verify_ssl=True):
if user_agent is None:
user_agent = USER_AGENT
async with aiohttp.ClientSession(headers={'User-Agent': user_agent}, cookie_jar=abs_cookie_jar) as session:
response = await session.get(url)
if response.cookies:
abs_cookie_jar.update_cookies(cookies=response.cookies)
abs_cookie_jar.save('.aiogoogle-cookie')
html = await response.text()
return html
What happens is when you head to
google.comyou are getting redirected. As a result, 3 HTTP requests are performed with response codes 301, 302, 200 (You can display them by accessingresponse.historyattribute).The
Set-Cookieheader is added to the first response, but what you have inresponsevariable is the last one, which does not contain cookies.The update part in your implementation:
abs_cookie_jar.update_cookies(cookies=response.cookies)is not needed as aiohttp does that automatically for all requests see source.How your solution could be fixed: