Prblem with find_all in BeautifulSoup4

52 Views Asked by Maximiliano Machado Goncalves At 12 February 2024 at 21:21

I want to get information on the following website. I need book titles, codes, prices, etc. For instance, let's concentrate on ISBN codes. I want to find in the html any piece of text that has the "ISBN" word.

My code is the following:

url_0 = 'https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1'

result = requests.get(url)

doc = BeautifulSoup(result.text, "html.parser")

aux = doc.find_all(string="ISBN")

My problem here is that my outcome aux is empty, I cannot find anything with ISBN, but looking at the html I do see this word.

Original Q&A

There are 2 best solutions below

Thotsawat J. On 13 February 2024 at 03:03 BEST ANSWER

This may not be the best way, But it might be an alternative I'am using lambda to filter tag div where "ISBN" text inside

import requests
from bs4 import BeautifulSoup

url = 'https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1'
result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
# Find elements ISBN
aux = doc.find_all(lambda tag: tag.name == 'div' and "ISBN" in tag.text)
for element in aux:
    print(element.text)

Andrej Kesely On 13 February 2024 at 00:18

As stated in the comment, you can use re module to search for strings:

import re

import requests
from bs4 import BeautifulSoup

url = "https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1"
result = requests.get(url)

doc = BeautifulSoup(result.text, "html.parser")
aux = doc.find_all(string=re.compile("ISBN"))

print(aux)

Prints:

['\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ']

But more useful will be searching for HTML tags which contain the string "ISBN":

for tag in doc.select(':-soup-contains-own("ISBN")'):
    print(tag.prettify())

Prints:


...

<div class="col-12 col-md-4 text-right">
 <strong>
  <span class="price">
   € 24.95
  </span>
 </strong>
 <br/>
 Van
 <strong>
  01-10-2023
 </strong>
 t.e.m.
 <strong>
  01-04-2024
 </strong>
 <br/>
 ISBN
 <strong>
  9789090374475
 </strong>
 <br/>
</div>

...

Prblem with find_all in BeautifulSoup4

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in FINDALL

Trending Questions

Popular # Hahtags

Popular Questions