Python scraper for facebook marketplace isn't working anymore

93 Views Asked by At

Since yesterday, I encountered an issue where my facebook marketplace scraper ceased to fetch data, I'm currently using scrapy due to his features, am I doing any mistakes? Output has been shared on my gist

Current code below

from scrapy import Spider
import logging


class Facebook(Spider):
    name = 'facebook'
    start_urls = ["https://www.facebook.com/marketplace/112047398814697/search?query=funko&sortBy=creation_time_descend&radius=500"]

    def parse(self, response):

        from pdb import set_trace; set_trace()
        # get all HTML product elements
        products = response.xpath('//div[@style="max-width:1872px"]/div[2]/div')
        # iterate over the list of products
        for product in products:
           # return a generator for the scraped item
            yield {
                "name": product.css("h2::text").get(),
                "image": product.css("img").attrib["src"],
                "price": product.css('span::text').getall()[2],
                "url": product.css("a").attrib["href"],
            }

I've already tested with selenium and requests-html, but they don't works like expected.

1

There are 1 best solutions below

0
Lakshmanarao Simhadri On

The data is now present in the script tags as a JSON and that's why you are not able to extract any products details. You need to get the JSON string and then convert it to dictionary to access the necessary details from it. You can find the code snippet below.

from scrapy import Spider
import logging
import json


class Facebook(Spider):
        name = 'facebook'
        start_urls = ["https://www.facebook.com/marketplace/112047398814697/search?query=funko&sortBy=creation_time_descend&radius=500"]

        def parse(self, response):
                # Get json string from the HTML
                nodes = response.xpath("//script[contains(text(),'MarketplaceFeedListingStoryObject')]/text()")
                json_str = nodes[0].get()
                data_dict = json.loads(json_str)
                products = data_dict['require'][0][3][0]['__bbox']['require'][0][3][1]['__bbox']['result']['data']['marketplace_search']['feed_units']['edges']
                # iterate over the list of products
                for product in products:
                        # return a generator for the scraped item
                        name = product['node']['listing']['marketplace_listing_title']
                        img_url = product['node']['listing']['primary_listing_photo']['image']['uri']
                        price = product['node']['listing']['listing_price']['formatted_amount']
                        prod_id = product['node']['story_key']
                        url = f"https://www.facebook.com/marketplace/item/{prod_id}"
                        yield {
                                        "name": name,
                                        "image": img_url,
                                        "price": price,
                                        "url" : url
                                }

OUTPUT(Sample of top 5 records)

[{"name": "Personagens, pel\u00facias lindas e de qualidade da Black", "image": "https://scontent.fhyd14-2.fna.fbcdn.net/v/t45.5328-4/429928588_24793716480274856_8546570146970955724_n.jpg?stp=c0.43.261.261a_dst-jpg_p261x260&_nc_cat=108&ccb=1-7&_nc_sid=247b10&_nc_ohc=ceFbab6P7S0AX8eOVPa&_nc_ht=scontent.fhyd14-2.fna&oh=00_AfAZIwkb0bEG3bv3TaxDT54uoZdCfYjMVRRR7TEizsZLTw&oe=65EB7194", "price": "R$45", "url": "https://www.facebook.com/marketplace/item/24969789585998853"},
{"name": "Funko Harley Quinn", "image": "https://scontent.fhyd14-1.fna.fbcdn.net/v/t45.5328-4/429821610_7200991503350116_4095918823174063593_n.jpg?stp=c0.43.261.261a_dst-jpg_p261x260&_nc_cat=102&ccb=1-7&_nc_sid=247b10&_nc_ohc=mlyHaWXBChAAX8C6U9Q&_nc_ht=scontent.fhyd14-1.fna&oh=00_AfAv9C9spT7Wwt7VkdvbdW0nVVleLLOF508EXLwmqBcVHg&oe=65EA148C", "price": "R$150", "url": "https://www.facebook.com/marketplace/item/7226874327390390"},
{"name": "Funko Homem Aranha", "image": "https://scontent.fhyd14-2.fna.fbcdn.net/v/t45.5328-4/430690561_7027936467316381_2198370265159815215_n.jpg?stp=c0.7.261.261a_dst-jpg_p261x260&_nc_cat=108&ccb=1-7&_nc_sid=247b10&_nc_ohc=rNzEmNcU4j8AX9C_vAG&_nc_ht=scontent.fhyd14-2.fna&oh=00_AfCNX6a4MpResENwtLkRB54cQoruAE1nelGVC0QcHzdpFw&oe=65EACB55", "price": "R$120", "url": "https://www.facebook.com/marketplace/item/7181301185323208"},
{"name": "Pop! Funko Television The Green Hornet and Kato (2029 Fall Convention - Limited edition)", "image": "https://scontent.fhyd14-1.fna.fbcdn.net/v/t45.5328-4/425299286_7285507438200685_1213036715024683971_n.jpg?stp=c43.0.260.260a_dst-jpg_p261x260&_nc_cat=110&ccb=1-7&_nc_sid=247b10&_nc_ohc=TK7r0aRoKaIAX88Gu2D&_nc_ht=scontent.fhyd14-1.fna&oh=00_AfArqT76xeIJrVTqF_bU60oZYfeNfVZ22NNI9QhYmPB_WQ&oe=65E9D22E", "price": "R$170", "url": "https://www.facebook.com/marketplace/item/7513165018740640"},
{"name": "DESAPEGO DE FUNKO POP LOOSES", "image": "https://scontent.fhyd14-1.fna.fbcdn.net/v/t45.5328-4/428183170_7061261334001976_307640594589911567_n.jpg?stp=c0.43.261.261a_dst-jpg_p261x260&_nc_cat=110&ccb=1-7&_nc_sid=247b10&_nc_ohc=Nj2toiYU698AX8SIDFj&_nc_ht=scontent.fhyd14-1.fna&oh=00_AfDdKvO0WkpdvgS3_Z1Okt6kk95N4cnKD8Imww-ZjxY6uw&oe=65EB8BC4", "price": "$150", "url": "https://www.facebook.com/marketplace/item/7455762907800771"}]