Scraping data from a dynamic ecommerce webpage

2k Views Asked by At

I'm trying to scrap the titles of all the products listed on a webpage of an E-Commerce site(in this case, Flipkart). Now, the products that I would be scraping would depend of the keyword entered by the user. A typical URL generated if I entered a product 'XYZXYZ' would be:

http://www.flipkart.com/search?q=XYXXYZ&as=off&as-show=on&otracker=start 

Now, using this link as a template, I wrote the following script to scrap the titles of all the products listed under any given webpage based on the keyword entered:

import requests
from bs4 import BeautifulSoup

def flipp(k):
    url = "http://www.flipkart.com/search?q=" + str(k) + "&as=off&as-show=on&otracker=start"
    ss = requests.get(url)
    src = ss.text
    obj = BeautifulSoup(src)
    for e in obj.findAll("a", {'class' : 'lu-title'}):
        title = e.string
        print unicode(title)

h = raw_input("Enter a keyword:")
print flipp(h)

However, the above script returns None as the output. When I tried to debug at each step, I found that the requests module is unable to get the source code of the webpage. What seems to be happening over here?

2

There are 2 best solutions below

10
Md. Mohsin On BEST ANSWER

This does the trick,

import requests
from bs4 import BeautifulSoup
import re

def flipp(k):
    url = "http://www.flipkart.com/search?q=" + str(k) + "&as=off&as-show=on&otracker=start"
    ss = requests.get(url)
    src = ss.text
    obj = BeautifulSoup(src)
    for e in obj.findAll("a",class_=re.compile("-title")):
        title = e.text
        print title.strip()

h = raw_input("Enter a keyword:") # I used 'Python' here
print flipp(h)

Out[1]:
Think Python (English) (Paperback)
Learning Python (English) 5th  Edition (Hardcover)
Python in Easy Steps : Makes Programming Fun ! (English) 1st Edition (Paperback)
Python : The Complete Reference (English) (Paperback)
Natural Language Processing with Python (English) 1st Edition (Paperback)
Head First Programming: A learner's guide to programming using the Python language (English) 1st  Edition (Paperback)
Beginning Python (English) (Paperback)
Programming Python (English) 4Th Edition (Hardcover)
Computer Science with Python Language Made Simple - (Class XI) (English) (Paperback)
HEAD FIRST PYTHON (English) (Paperback)
Raspberry Pi User Guide (English) (Paperback)
Core Python Applications Programming (English) 3rd  Edition (Paperback)
Write Your First Program (English) (Paperback)
Programming Computer Vision with Python (English) 1st Edition (Paperback)
An Introduction to Python (English) (Paperback)
Fundamentals of Python: Data Structures (English) (Paperback)
Think Complexity (English) (Paperback)
Foundations of Python Network Programming: The comprehensive guide to building network applications with Python (English) 2nd Edition (Soft Cover)
Python Programming for the Absolute Beginner (English) (Paperback)
EXPERT PYTHON PROGRAMMING BEST PRACTICES FOR DESIGNING,CODING & DISTRIBUTING YOUR PYTHON 1st Edition (Paperback)
None
3
Simeon Visser On

The problem is that flipp has no return statement and you're therefore printing None (which is the default return value of any Python function in the absence of a return statement).

It could be that you're using keywords that have no results but I'm getting a page back with that script just fine.