Select all <table> elements without classes or ids with BeautifulSoup

60 Views Asked by At

I am trying to select all <table> elements on some web pages with BeautifulSoup. The table elements do not have specific classes or ids.

import bs4
import requests

def get_keycode_soup(url):
    res = requests.get(url)
    res.raise_for_status()
    return bs4.BeautifulSoup(res.text, features="html.parser")

def parse_qmk_soup():
    qmk_soup = get_keycode_soup("https://docs.qmk.fm/#/keycodes")
    tables = qmk_soup.select("table")
    # pass line for breakpoint
    pass

def main():
    parse_qmk_soup()

if __name__ == "__main__":
    main()

I have also tried selecting all the different table elements with

tables = qmk_soup.find_all("table")
# and
table_rows = qmk_soup.find_all("tr")

Whenever I pause the debugger on the pass line, tables is always None.

I have tried some similar methods to this post and this post, but since there do not appear to be any other descriptive tags on the tables I'm trying to select, iterating feels inefficient.

Is there a way to simply select all the <table> elements on their own?

Edit: it appears that the page requires JS to load the tables as suggested by @DeepSpace below. Additionally, see the answer from @MendelG regarding following where the data is loaded from in case you might obtain the data from the source.

1

There are 1 best solutions below

1
MendelG On BEST ANSWER

If you inspect your browser's Network calls, and view the HTTP requests, you'll see that the data is loaded from a different website URL, which is:

https://docs.qmk.fm/keycodes.md?cache-bust=1706627991267

The thing is, it's really a markdown file (.md). However, at least you obtain the original file

So, there isn't really any HTML to parse, to obtain it in a readable format.