python urllib, urllib2 how to get SHARP links

119 Views Asked by Ozan Honamlioglu At 21 May 2016 at 09:59

okey my dear helpers, here is the question, I can not get the ' http://example.com/#sharplink ', by the way in the site making infinite loop so I used redirect handler and it need to enable the cookielibrary,

here is the my codes

import urllib2, urllib, cookielib


urllib.FancyURLopener.version = 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.3) Gecko/2008092814 (Debian-3.0.1-1)'

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fb, code, msg, headers, newurl):
        m = req.get_method()
        if (code in (301, 302, 303, 307) and m in ('GET', 'HEAD') or code in (301, 302, 303) and m == 'POST'):
            newurl = newurl.replace(' ', '%20')
            newheaders = dict((k,v) for k,v in req.headers.items()
                    if k.lower() not in ("content-length", "content-type")
                    )
            return urllib2.Request(newurl,
                headers=newheaders,
                origin_req_host=req.get_origin_req_host(),
                unverifiable=True)
        else:
            raise HTTPError(req.get_full_url(), code, msg, headers, fp)


cj = cookielib.CookieJar()

opener = urllib2.build_opener(MyHTTPRedirectHandler, urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)

req = urllib2.Request('http://example.com/goto/#sharplink')

response = urllib2.urlopen(req)

f=open('bet','w')
f.write(response.read())
f.close()

but every time I just can get ' http://example.com/goto ' page not the sharp page, please help me !!!

Original Q&A

There are 1 best solutions below

dorian On 21 May 2016 at 11:05

The fragment part of an URL ("sharplink") is not sent to the webserver (it's commonly used to define a specific section on the given webpage that a link refers to), so it doesn't matter whether you request http://example.com/goto/ or http://example.com/goto/#sharplink.

If you expect the pages to be different, then most likely the site uses an AJAX framework which encodes state in the fragment part of the URL. As urllib and friends do not execute JS, you'd need to use something like phantomjs to get the content of the page.

python urllib, urllib2 how to get SHARP links

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in URLLIB2

Related Questions in MECHANIZE

Related Questions in URLLIB

Related Questions in COOKIELIB

Trending Questions

Popular # Hahtags

Popular Questions