Choosing a Python webscraping framework for handling pure Javascript based sites

1k Views Asked by At

I'm a Python programmer specializing in web-scraping, I had to ask this question as I found nothing relevant.

I want to know what are the popular, well documented frameworks that are available for Python for scraping pure Javascript based sites? Currently I know Mechanize and Beautiful Soup but they do not interact with Javascript so I'm looking for something different. I would prefer something that would be as elegant and simple as mechanize.

I've done a bit of research and so far I've heard about Selenium, Selenium 2 and Windmill.

Right now I'm trying to choose among one these three and I do not know of any others.

So can anyone point out the features of these frameworks and what makes them different? I heard that Selenium uses a separate server to do all it's task and it seems to be feature rich. Also what is the core difference between Selenium and Selenium2? Please enlighten me if I'm wrong, and if you know of any other frameworks do mention it's features and other details.

Thanks.

1

There are 1 best solutions below

1
stefanw On

Before using tools like Selenium that are designed for front end testing and not for scraping, you should have a look at where the data on the site comes from. Find out what XHR requests are made, what parameters they take and what the result is.

For example the site you mentioned in your comment does a POST request with lots of parameters in JavaScript and displays the result. You probably only need to use the result of this POST request to get your data.