Cannot find html element using css or xpath selectors in Scrapy

115 Views Asked by At

I'm using Scrapy to scrape this website. I want to grab all the div elements with class="data1". I'm using css and xpath selectors to do so. However, I cannot find these elements using css and xpath selectors even though I can see them in the html code in the browser.

In the scrapy shell after fetching the url:

In [6]: response.css('div#my_div')
Out[6]: [<Selector query="descendant-or-self::div[@id = 'my_div']" data='<div id="my_div">Results will be show...'>]

In [7]: response.css('div#my_div div')
Out[7]: []

In [8]: response.xpath('//div[@class="data1"]')
Out[8]: []

The html looks something like this:

<div id="my_div" style>
 <div class="data1">...</div>
 <div class="data1">...</div>
 <div class="data1">...</div>
 ...
</div>
1

There are 1 best solutions below

0
Alexander On BEST ANSWER

This is because that portion of the site is rendered with javascript. You can see this if you were to call .get() on your first query in your example:

In [1]: response.css('div#my_div').get()

Out[1]: '<div id="my_div">Results will be shown here.</div>'

If you investigate by looking in the network tab of the browser developer tools you can discover that all that information is coming from an api call to 'https://data.crn.com/2023/wotc2023.php?st1=1&st2=a' which when fetched via scrapy shell returns a json object with all the information in that section.

In [3]: fetch('https://data.crn.com/2023/wotc2023.php?st1=1&st2=a')
2023-05-08 20:57:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://data.crn.com/2023/wotc2023.php?st1=1&st2=a> (referer: None)

In [4]: response.json()
Out[4]: 
[{'Pkey': '617',
  'Company': 'F5',
  'Name_First': 'Barbara',
  'Name_Last': 'Abboud',
  'Image': 'f5-abboud-barbara.jpg'},
 {'Pkey': '1208',
  'Company': 'Samsung Electronics America',
  'Name_First': 'Shpresa',
  'Name_Last': 'Abdullaj',
  'Image': 'samsung-electronics-america-abdullaj-shpresa.jpg'},
 {'Pkey': '499',
  'Company': 'Davenport Group',
  'Name_First': 'Kim',
  'Name_Last': 'Abrams',
  'Image': 'davenport-group-abrams-kim.jpg'},
 {'Pkey': '35',
  'Company': 'Alteryx',
  'Name_First': 'Daniella',
  'Name_Last': 'Aburto Valle',
  'Image': 'alteryx-aburto-valle-daniella.jpg'},
  .......]