okey my dear helpers, here is the question, I can not get the ' http://example.com/#sharplink ', by the way in the site making infinite loop so I used redirect handler and it need to enable the cookielibrary,
here is the my codes
import urllib2, urllib, cookielib
urllib.FancyURLopener.version = 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.3) Gecko/2008092814 (Debian-3.0.1-1)'
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def redirect_request(self, req, fb, code, msg, headers, newurl):
m = req.get_method()
if (code in (301, 302, 303, 307) and m in ('GET', 'HEAD') or code in (301, 302, 303) and m == 'POST'):
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type")
)
return urllib2.Request(newurl,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
unverifiable=True)
else:
raise HTTPError(req.get_full_url(), code, msg, headers, fp)
cj = cookielib.CookieJar()
opener = urllib2.build_opener(MyHTTPRedirectHandler, urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
req = urllib2.Request('http://example.com/goto/#sharplink')
response = urllib2.urlopen(req)
f=open('bet','w')
f.write(response.read())
f.close()
but every time I just can get ' http://example.com/goto ' page not the sharp page, please help me !!!
The fragment part of an URL ("sharplink") is not sent to the webserver (it's commonly used to define a specific section on the given webpage that a link refers to), so it doesn't matter whether you request
http://example.com/goto/orhttp://example.com/goto/#sharplink.If you expect the pages to be different, then most likely the site uses an AJAX framework which encodes state in the fragment part of the URL. As
urlliband friends do not execute JS, you'd need to use something likephantomjsto get the content of the page.