How do you read responses from cheerio?

32 Views Asked by At

I'm having a bit of difficulty figuring out how to read cheerio's responses after running the following:

const axios = require('axios')
const cheerio = require('cheerio')
axios.get('https://bulbapedia.bulbagarden.net/wiki/Galar_Route_5')
    .then(({data}) => {
        const $ = cheerio.load(data)

        const tableData = $('table:first').after('span#Hidden_encounters')
        console.log(tableData)
    })

When I run the above code I get a quite lengthy response:

LoadedCheerio {
  '0': <ref *1> Element {
    parent: Element {
      parent: [Element],
      prev: null,
      next: [Text],
      startIndex: null,
      endIndex: null,
      children: [Array],
      name: 'div',
      attribs: [Object: null prototype],
      type: 'tag',
      namespace: 'http://www.w3.org/1999/xhtml',
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype]
    },
    prev: null,
    next: Text {
      parent: [Element],
      prev: [Circular *1],
      next: [Text],
      startIndex: null,
      endIndex: null,
      data: 'span#Hidden_encounters',
      type: 'text'
    },
    startIndex: null,
    endIndex: null,
    children: [ [Text], [Element] ],
    name: 'table',
    attribs: [Object: null prototype] {
      class: 'roundy',
      style: 'background: #75C977; width: 30%; max-width: 30%; margin-left: 5px; margin-bottom: 5px; border: 3px solid #4AA14D; float:right; text-align:center'
    },
    type: 'tag',
    namespace: 'http://www.w3.org/1999/xhtml',
    'x-attribsNamespace': [Object: null prototype] { class: undefined, style: undefined },
    'x-attribsPrefix': [Object: null prototype] { class: undefined, style: undefined }
  },
  length: 1,
  options: { xml: false, decodeEntities: true },
  _root: <ref *2> LoadedCheerio {
    '0': Document {
      parent: null,
      prev: null,
      next: null,
      startIndex: null,
      endIndex: null,
      children: [Array],
      type: 'root',
      'x-mode': 'no-quirks'
    },
    length: 1,
    options: { xml: false, decodeEntities: true },
    _root: [Circular *2]
  },
  prevObject: <ref *3> LoadedCheerio {
    '0': Document {
      parent: null,
      prev: null,
      next: null,
      startIndex: null,
      endIndex: null,
      children: [Array],
      type: 'root',
      'x-mode': 'no-quirks'
    },
    length: 1,
    options: { xml: false, decodeEntities: true },
    _root: [Circular *3]
  }
}

I read through the docs and there doesn't appear to be anything within them regarding how to read or interpret the response.

I have searched elsewhere (youtube, here, duckduckgo) and I haven't been able to find any resources that actually tell you how to interpret the response.

1

There are 1 best solutions below

2
ggorlen On BEST ANSWER

What you see is a Cheerio node object[1], not an HTTP response. You don't necessarily need to be able to read it, per se, although there is some useful information visible. Just call functions from the Cheerio API, such as .text(), .find(). .map() and so on, to manipulate the object and extract data from it. Ultimately, the idea is to process the Cheerio node tree into some vanilla JS data structure or primitive value(s).

For this example, although you didn't share your goal or desired output, I think you want the following table data, more or less:

const axios = require("axios"); // ^1.6.8
const cheerio = require("cheerio"); // ^1.0.0-rc.12

axios
  .get("<Your URL>")
  .then(({data}) => {
    const $ = cheerio.load(data);
    const [headers, ...tableData] = [
      ...$("#Hidden_encounters")
        .closest("h1, h2, h3, h4, h5, h6")
        .next("table")
        .find("tr"),
    ]
      .map(e =>
        [...$(e).find("th, td")].map(e => $(e).text().trim())
      )
      .filter(e => e.length > 1);
    console.log(headers);
    console.table(tableData);
  });
[ 'Pokémon', 'Games', 'Location', 'Levels', 'Rate' ]
┌─────────┬────────────┬──────┬──────┬──────────────┬─────────┬────────┐
│ (index) │ 0          │ 1    │ 2    │ 3            │ 4       │ 5      │
├─────────┼────────────┼──────┼──────┼──────────────┼─────────┼────────┤
│ 0       │ 'Lombre'   │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '20%'  │
│ 1       │ 'Nuzleaf'  │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '20%'  │
│ 2       │ 'Nincada'  │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '5%'   │
│ 3       │ 'Espurr'   │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '20%'  │
│ 4       │ 'Spritzee' │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '14%'  │
│ 5       │ 'Swirlix'  │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '14%'  │
│ 6       │ 'Dewpider' │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '5%'   │
│ 7       │ 'Dottler'  │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '26%'  │
│ 8       │ 'Applin'   │ 'Sw' │ 'Sh' │ 'Grass'      │ '16-18' │ '10%'  │
│ 9       │ 'Skwovet'  │ 'Sw' │ 'Sh' │ 'Berry tree' │ '16-18' │ '100%' │
│ 10      │ 'Goldeen'  │ 'Sw' │ 'Sh' │ 'Fishing'    │ '16-18' │ '10%'  │
│ 11      │ 'Magikarp' │ 'Sw' │ 'Sh' │ 'Fishing'    │ '16-18' │ '70%'  │
│ 12      │ 'Chewtle'  │ 'Sw' │ 'Sh' │ 'Fishing'    │ '16-18' │ '20%'  │
└─────────┴────────────┴──────┴──────┴──────────────┴─────────┴────────┘

More work can be done to group the two 'Games' cells into one, but this at least gets you started and handles the Cheerio part.

Many other threads exist on scraping tables with Cheerio. Taking a look at the following answers should clarify the two-step pattern of scraping rows, then cells with nested .map calls as shown above:

[1]: From this similar question as yours, Understanding Cheerio object and get attributes, this is really a parse5 node object (unless the Cheerio internals or API has changed since 2020, or the linked answer is inaccurate--I haven't verified it). Ultimately, though, I suggest asking about your fundamental problem of how to extract data--the exact object type is probably not important, beyond it being a Cheerio node of some sort.