How do I scrape data from within an HTML element based on an existing URL?

80 Views Asked by At

I have a script to save RSS data into a Spreadsheet, but it still has shortcomings and problems.

I have received data in the form of title, time, article link. https://i.stack.imgur.com/9YTAF.png

I want the script to be able to retrieve descriptions based on tags or HTML classes from each article link, so that the data I get is title, time, description, article link

For example, I want to retrieve the description of a div class called entry-content from the article link https://e-ficiencia.com/samsung-climate-solutions-acudira-cyr-2023/

My hope is that the data I get on the Spreadsheet will be like this

https://i.stack.imgur.com/kT00s.png https://docs.google.com/spreadsheets/d/1lPn7xHEEI1NknN8l9w6hu4SkQburm8s-NdAjPsPc-NM/edit#gid=0

Following my Google Apps Script

function myFunction() {
  getURLData();
}

function getURLData() {
 
  var currentData = [];
  var urltoCheck = ["https://e-ficiencia.com/feed/", "https://www.climanoticias.com/feed/all","https://www.proinstalaciones.com/actualidad/noticias?format=feed"];
  for (var i = 0; i < urltoCheck.length; i++){
  var ficiencaData = UrlFetchApp.fetch(urltoCheck[i]);
  var xml = ficiencaData.getContentText()
  let response = XmlService.parse(xml);
  var root = response.getRootElement();
   let channel = root.getChild('channel');
  let items = channel.getChildren('item');
    items.forEach(item => {
      let title = item.getChild('title').getText();
      let pubDateb = item.getChild('pubDate').getText();
      let link = item.getChild('link').getText();
      currentData.push([title,pubDateb,link])
   
  });
}
  var ss = SpreadsheetApp.getActiveSpreadsheet()
  var sheet = ss.getSheetByName("Sheet1");
  var currentDataRange = sheet.getRange(sheet.getLastRow() + 1, 1, currentData.length, currentData[0].length);
  currentDataRange.setValues(currentData); 
  
}


0

There are 0 best solutions below