I'm using Scrapy to scrape this website. I want to grab all the div elements with class="data1". I'm using css and xpath selectors to do so. However, I cannot find these elements using css and xpath selectors even though I can see them in the html code in the browser.
In the scrapy shell after fetching the url:
In [6]: response.css('div#my_div')
Out[6]: [<Selector query="descendant-or-self::div[@id = 'my_div']" data='<div id="my_div">Results will be show...'>]
In [7]: response.css('div#my_div div')
Out[7]: []
In [8]: response.xpath('//div[@class="data1"]')
Out[8]: []
The html looks something like this:
<div id="my_div" style>
<div class="data1">...</div>
<div class="data1">...</div>
<div class="data1">...</div>
...
</div>
This is because that portion of the site is rendered with javascript. You can see this if you were to call
.get()on your first query in your example:If you investigate by looking in the network tab of the browser developer tools you can discover that all that information is coming from an api call to
'https://data.crn.com/2023/wotc2023.php?st1=1&st2=a'which when fetched via scrapy shell returns ajsonobject with all the information in that section.