Python 3.X Playing with the internet

1.1k Views Asked by At

I'm doing a small project to help my work go by faster. I currently have a program written in Python 3.2 that does almost all of the manual labour for me, with one exception. I need to log on to the company website (username and password) then choose a month and year and click download. I would like to write a little program to do that for me, so that the whole process is completely done by the program.

I have looked into it and I can only find tools for 2.X. I have looked into urllib and I know that some of the 2.X moudles are now in urllib.request.

I have even found some code to start it off, however I'm confused as to how to put it into practise.

Here is what I have found:

import urllib2

theurl = 'http://www.someserver.com/toplevelurl/somepage.htm'

username = 'johnny'
password = 'XXXXXX'
# a great password

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
# this creates a password manager
passman.add_password(None, theurl, username, password)
# because we have put None at the start it will always
# use this username/password combination for  urls
# for which `theurl` is a super-url

authhandler = urllib2.HTTPBasicAuthHandler(passman)
# create the AuthHandler

opener = urllib2.build_opener(authhandler)

urllib2.install_opener(opener)
# All calls to urllib2.urlopen will now use our handler
# Make sure not to include the protocol in with the URL, or
# HTTPPasswordMgrWithDefaultRealm will be very confused.
# You must (of course) use it when fetching the page though.

pagehandle = urllib2.urlopen(theurl)
# authentication is now handled automatically for us

All Credit to Michael Foord and his page: Basic Authentication

So I changed the code around a bit and replaced all the 'urllib2' with 'urllib.request'

Then I learned how to open a webpage, figuring the program should open the webpage, use the login and password data to open the page, then I'll learn how to download the files from it.

ie = webbrowser.get('c:\\program files\\internet explorer\\iexplore.exe')
ie.open(theurl)

(I know Explorer is garbage, just using it to test then I'll be using crome ;) )

But that doesnt open the page with the login data entered, it simply opens the page as though you had typed in the url.

How do I get it to open the page with the password handle? I sort of understand how Michael made them, but I'm not sure which to use to actually open the website.

Also an after thought, might I need to look into cookies?

Thanks for your time

4

There are 4 best solutions below

3
On BEST ANSWER

you get things confused here. webbrowser is a wrapper around your actual webbrowser, and urllib is a library for http- and url-related stuff. They don't know each other, and serve very different purposes.

In former IE versions, you could encode HTTP Basic Auth username and password in the URL like so: http(s)://Username:Password@Server/Ressource.ext - I believe Firefox and Chrome still support that, IE killed it: http://support.microsoft.com/kb/834489/EN-US

if you want to emulate a browser, rather than just open a real one, take a look at mechanize: http://wwwsearch.sourceforge.net/mechanize/

1
On

your browser doesn't know anything about the authenitcation you've done in python (and that has nothing to do wheater your browser is garbage or not). the webbrowser module simply offers convenience methods for launching a browser and pointing it to a webbrowser. you can't 'transfer' your credentials to the browser.

as for migrating from python2 to python3: the 2to3 tool can convert simple scripts like your automatically.

1
On

They are not running in the same environment.

You need to figure out what really happened when you click the download button. Use your browser's develop tool to get the POST format the website is using. Then build a request in python to fetch the file.

Requests is a nice lib to do that kind of things much easier.

0
On

I would use selenium, this is some code from a little script I have hacked about a bit to give you an idea:

def get_name():
user = 'johnny'
passwd = 'XXXXXX'
try : 

    driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)
    driver.get('http://www.someserver.com/toplevelurl/somepage.htm')
    assert 'Page Title' in driver.title
    username = driver.find_element_by_name('name_of_userid_box')
    username.send_keys(user)
    password = driver.find_element_by_name('name_of_password_box')
    password.send_keys(passwd)
    submit = driver.find_element_by_name('name_of_login_button')
    submit.click()
    driver.get('http://www.someserver.com/toplevelurl/page_with_download_button.htm')
    assert 'page_with_download_button title' in driver.title
    download = driver.find_element_by_name('download_button')
    download.click()
except :

    print('process failed')

I'm new to python so that may not be the best code every written but it should give you the general idea.

Hope it helps