Cheerio/Phantom scraping not returning page/working as I thought

18 Views Asked by At
const axios = require("axios");
const cheerio = require("cheerio");
const phantom = require('phantom');
const pretty = require('pretty');

const main = async () => {
    const instance = await phantom.create();
    const page = await instance.createPage();
    await page.on('onRequestedSource', function (requestData) {
        console.info('Requesting', requestData.url);
    })

    const url = 'https://www.smogon.com/dex/sm/pokemon/gengar/ou/';
    console.log('URL::', url);

    const status = await page.open(url);
    console.log('STATUS::', status);

    const content = await page.property('content');
    // console.log('CONTENT::', content);

    const html = content;
    const $ = cheerio.load(html);

    // console.log(pretty($.html())) // Loads the HTML of the page including h1 tags

    const headersOne = $('h1');
    console.log($(headersOne).text()); // null

    const divs = $('div');
    console.log($(divs).text()); // 'Loading...'

    await instance.exit()
}

main().catch(console.log);

So i've used cheerio before and have many projects like this one, but for some reason I can load the entire HTML in the $.html() console.log. But when i try to access things like h1 tags i recieve null. As well the divs come back as 'Loading...', and if I selected a title it would return the title tage but without the content. I've tried setting timeouts with axios because that also didn't work.

But the weridest part is I can see the html and inside of it is loaded content, I just can't access it with cherrio.

Would really appreciate some help/advice.

Thank you in advance.

Im expecting for it to work like so. (I know this uses axios but the same concept)

        axios(URL)
            .then(response =>  {
                const html = response.data;
                const $ = cheerio.load(html);
    
                const headings = $('h2');      
                headings.each((index, element) => {
                     const heading = $(element).text();
                     console.log(heading) // returns the h2 on the page and the content.
                })
0

There are 0 best solutions below