use puppeteer to scrape paragraph inner text and image title from table td

92 Views Asked by At

table

I have a table with this structure. and i want to scrape from the td with class 'description' the image title and the text from paragraph. I have tried several ways with no luck. Please help me on this guys i am really stacked here.

I think my question is very clear but so far i have

 let descs = await page.evaluate(() => {
        let desc = Array.from(document.querySelectorAll('tr.even td.description p'))
        return desc.filter((p) => p.innerText !== "").map(p => p.innerText.replace((/  |\r\n|\n|\r/gm),""));
   });                                                                                                                              

With this code i am getting the paragraph text but how can i get the img title also?

1

There are 1 best solutions below

0
Yaroslavm On BEST ANSWER

By provided HTML structure i suggest to get td element and perform $$eval with mapping on it.

Where texts is your function that was defined for p and title you are getting by querySelector with img[src] selector from td element.

await page.waitForSelector('tr.even td.description');
const data = await page.$$eval('tr.even td.description', tds =>
      tds.map(td => {
        return {
          texts: Array.from(td.querySelectorAll('p')).filter((p) => p.innerText !== "").map(p => p.innerText.replace((/  |\r\n|\n|\r/gm),"")),
          title: td.querySelector('img[src]')?.getAttribute('title'),
        }
      })
    );```