Prblem with find_all in BeautifulSoup4

52 Views Asked by At

I want to get information on the following website. I need book titles, codes, prices, etc. For instance, let's concentrate on ISBN codes. I want to find in the html any piece of text that has the "ISBN" word.

My code is the following:

url_0 = 'https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1'

result = requests.get(url)

doc = BeautifulSoup(result.text, "html.parser")

aux = doc.find_all(string="ISBN")

My problem here is that my outcome aux is empty, I cannot find anything with ISBN, but looking at the html I do see this word.

2

There are 2 best solutions below

0
Thotsawat J. On BEST ANSWER

This may not be the best way, But it might be an alternative I'am using lambda to filter tag div where "ISBN" text inside

import requests
from bs4 import BeautifulSoup

url = 'https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1'
result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
# Find elements ISBN
aux = doc.find_all(lambda tag: tag.name == 'div' and "ISBN" in tag.text)
for element in aux:
    print(element.text)
1
Andrej Kesely On

As stated in the comment, you can use re module to search for strings:

import re

import requests
from bs4 import BeautifulSoup

url = "https://www.boekenprijs.be/uitgebreid-zoeken?zoek=&veld=all&gbpstartdatumvan=&gbpstartdatumtotenmet=&gbpeinddatumvan=01/04/2024&gbpeinddatumtotenmet=12/08/2024&_token=FAoSCCoUK-SPrL-ktj4MtsVBv3L4K-FaH3jxSo259D0&page=1"
result = requests.get(url)

doc = BeautifulSoup(result.text, "html.parser")
aux = doc.find_all(string=re.compile("ISBN"))

print(aux)

Prints:

['\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ', '\n                        ISBN ']

But more useful will be searching for HTML tags which contain the string "ISBN":

for tag in doc.select(':-soup-contains-own("ISBN")'):
    print(tag.prettify())

Prints:


...

<div class="col-12 col-md-4 text-right">
 <strong>
  <span class="price">
   € 24.95
  </span>
 </strong>
 <br/>
 Van
 <strong>
  01-10-2023
 </strong>
 t.e.m.
 <strong>
  01-04-2024
 </strong>
 <br/>
 ISBN
 <strong>
  9789090374475
 </strong>
 <br/>
</div>

...