I am working on a web scraping project using Scrapy to extract job information from Indeed job listings. My goal is to extract job titles, locations, and salary information from the job listings. While I've managed to extract most of the required data, I'm facing difficulties with correctly extracting the salary information.
I've noticed that the salary information is embedded within the "salarySnippet" dictionary in the JSON response. However, I'm having trouble accessing the actual salary text and currency values from this dictionary. Here's a snippet of my current code:
import scrapy
import re
import json
class IndeedJobSpider(scrapy.Spider):
name = 'indeed_jobs'
start_urls = ['https://in.indeed.com/jobs?q=python+developer&l=Kerala&vjk=372ecad0fcaac92a']
def parse(self, response):
# Extract job titles
script_tag = re.findall(r'window.mosaic.providerData\["mosaic-provider-jobcards"\]=(\{.+?\});', response.text)
if script_tag is not None:
json_blob = json.loads(script_tag[0])
jobs_list = json_blob['metaData']['mosaicProviderJobCardsModel']['results']
for job in jobs_list:
job_name = job.get('displayTitle', 'N/A')
job_location = job.get('formattedLocation', 'N/A')
# Extract the salary text from the 'salarySnippet' dictionary
salary_snippet = job.get('salarySnippet', {})
print(f"Job Name: {job_name}")
print(f"Job Location: {job_location}")
print(f"Salary Info: {salary_snippet}")
print("-" * 40)
output:-
Job Name: Python Developer
Job Location: Thiruvananthapuram, Kerala
Salary Info: N/A
----------------------------------------
Job Name: Python Developer
Job Location: Edapalli, Kerala
Salary Info: N/A
----------------------------------------
Job Name: Python Developer
Job Location: Ernakulam, Kerala
Salary Info: N/A
----------------------------------------
Job Name: Python Django Developer
Job Location: Kochi, Kerala
Salary Info: N/A
----------------------------------------
Job Name: Backend Developer (Python)
Job Location: Kochi, Kerala
Salary Info: N/A
console data :-
salarySnippet
:
{currency: 'INR', salaryTextFormatted: false, source: 'EXTRACTION', text: '₹40,000 - ₹70,000 a month'}