Select only the bottom of nested divs, without knowing how nested they are

41 Views Asked by At

I'm trying to scrape a website that doesn't use class or ids, and the structure is like this:

<div>
  <div>
    <div>
      Some content
    </div>
  </div>
  <div>
    Other content
  <div>
</div>

I'm trying something like doc.css('div div') but that's returning duplicates of the content, since nested containers all match that selector.

How do I select only the bottom of the nest, knowing that they are not all the same depth?

Another way to phrase the question, is there a way to do something like "div with no div children"? It may have other children, just not divs

Edit:

Trying to clarify, with the above html I can call:

doc.css('div div').map(&:text)

To get the text of the document, divided into an array by the divs. The problem is, that line is returning "Some content" twice, because even though it exists once in the html, there are two 'div div' matches with that text.

1

There are 1 best solutions below

0
twalow On BEST ANSWER

This code finds all the leaf elements and checks if they're divs. This is what I'm assuming what you're trying to do.

// will be used to store all the leaves
const leaves = [];

// uses recursion to find all the leaves 
const findLeaves = ($branch) => {
    if ($branch.children.length === 0)
    {
        leaves.push($branch);
        return;
    }
    [...$branch.children].forEach(($branch) => findLeaves($branch));
};


// parent element of elements you want to search through
const $branch = document.querySelector("body > div");

// initiate finding leaves
findLeaves($branch);

// remove from all the leaves non divs
const what_you_want = leaves.filter(($leaf) => $leaf.tagName === "DIV");
console.log(what_you_want);