I'm having a problem in my spider in Scrapy. Based on a search key I scraped a search result page and found the links. but the next yield scrapy.Request() which scrapes the result pages doesn't enters at all. I call the spider from main.py which gets the search result from arguments. Here is the main.py:
def run_spider(key, format):
process = CrawlerProcess(settings={
'FEED_FORMAT': format,
})
process.crawl(mySpider, search_key=key)
process.start()
if __name__ == "__main__":
parser = argparse.ArgumentParser(prog='main.py')
parser.add_argument(
'-s', '--key', required=True)
parser.add_argument('-f', '--format', default='json', choices=['json', 'csv', 'xml'],
help='Output format for scraped data (default: json)')
args = parser.parse_args()
database_manager.create_tables(
models=[SearchKey, SearchResult, Author, Book, BookAuthor])
run_spider(args.key, args.format, path)
and here is the spider code:
def start_requests(self):
search_key = getattr(self, 'search_key', None)
if search_key:
url = base_url + f'?req={search_key}'
new_search = SearchKey.create(search_key=search_key)
search_id = new_search.id
yield scrapy.Request(url, callback=self.parse, meta={'search_id': search_id})
def parse(self, response):
# Extracting search results
search_id = response.meta['search_id']
# some scraping to get search result links
for link in links:
url = base_url + link
yield scrapy.Request(url, callback=self.parse_result) # this line is executed but it doesnt enters the parse_result()
def parse_result(self, response):
# Extracting result
I traced the code and the line to call parse_result gets executed but never enters the function at all.
I checked pipeline and settings and didn't see any problems.
Can anyone help please?
I found it allowed_domains
the site had another domain and in some point I changed my search url to that domain but forgot to change it in allowed_domains.