My spider doesn't enters yield scrapy.Request() at all in Scrapy

34 Views Asked by Horr Seraj At 21 February 2024 at 08:53

I'm having a problem in my spider in Scrapy. Based on a search key I scraped a search result page and found the links. but the next yield scrapy.Request() which scrapes the result pages doesn't enters at all. I call the spider from main.py which gets the search result from arguments. Here is the main.py:

def run_spider(key, format):
    process = CrawlerProcess(settings={
        'FEED_FORMAT': format,
    })
    process.crawl(mySpider, search_key=key)
    process.start()


if __name__ == "__main__":
    parser = argparse.ArgumentParser(prog='main.py')
    parser.add_argument(
            '-s', '--key', required=True)
    parser.add_argument('-f', '--format', default='json', choices=['json', 'csv', 'xml'],
                            help='Output format for scraped data (default: json)')
    args = parser.parse_args()
    database_manager.create_tables(
                models=[SearchKey, SearchResult, Author, Book, BookAuthor])
    
    run_spider(args.key, args.format, path)

and here is the spider code:

def start_requests(self):
    search_key = getattr(self, 'search_key', None)
    if search_key:
        url = base_url + f'?req={search_key}'
        new_search = SearchKey.create(search_key=search_key)
        search_id = new_search.id
        yield scrapy.Request(url, callback=self.parse, meta={'search_id': search_id})
def parse(self, response):
    # Extracting search results
    search_id = response.meta['search_id']
    # some scraping to get search result links
    for link in links: 
        url = base_url + link
        yield scrapy.Request(url, callback=self.parse_result) # this line is executed but it doesnt enters the parse_result() 

def parse_result(self, response):
    # Extracting result

I traced the code and the line to call parse_result gets executed but never enters the function at all.

I checked pipeline and settings and didn't see any problems.

Can anyone help please?

Original Q&A

There are 1 best solutions below

Horr Seraj On 22 February 2024 at 19:32

I found it allowed_domains

the site had another domain and in some point I changed my search url to that domain but forgot to change it in allowed_domains.

My spider doesn't enters yield scrapy.Request() at all in Scrapy

There are 1 best solutions below

Related Questions in CALLBACK

Related Questions in SCRAPY

Related Questions in WEB-CRAWLER

Trending Questions

Popular # Hahtags

Popular Questions