Selenium Python extracting information from a website and dumping it into JSON Format

Question

Selenium Python extracting information from a website and dumping it into JSON Format

308 Views Asked by Mohammad Sakaamini At 23 July 2022 at 15:06

I'm trying to open a Hotel website www.booking.com and extract the name, price, location, and link from the top 50 search results which are sorted by cheapest first. I'm using Selenium python to automate the process However some HTML elements are targetable while others are not. after inspecting the website I realized that all hotel names have the class name: fcab3ed991 a23c043802

I tried to target all of them and put them into an array as seen in my code below. But I can't seem to target the element correctly. What I'm I doing wrong?

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

PATH= "C:\Program Files (x86)\chromedriver.exe"
driver=webdriver.Chrome(PATH)
driver.get("https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB&sid=8005d0cc6b75af8d0d2e74451b73cb8b&aid=304142&sb=1&sb_lp=1&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB%26sid%3D8005d0cc6b75af8d0d2e74451b73cb8b%26sb_price_type%3Dtotal%26%26&ss=Jumeirah%2C+Dubai%2C+Dubai+Emirate%2C+United+Arab+Emirates&is_ski_area=&checkin_year=2022&checkin_month=8&checkin_monthday=1&checkout_year=2022&checkout_month=8&checkout_monthday=3&group_adults=2&group_children=0&no_rooms=1&map=1&b_h4u_keep_filters=&from_sf=1&ss_raw=jum&ac_position=1&ac_langcode=en&ac_click_type=b&dest_id=941&dest_type=district&place_id_lat=25.205553&place_id_lon=55.239216&search_pageview_id=c0ac477da63f02c2&search_pageview_id=c0ac477da63f02c2&search_selected=true&ac_suggestion_list_length=5&ac_suggestion_theme_list_length=0&order=price#map_closed")


try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "d4924c9e74"))
    )

    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "fcab3ed991 a23c043802"))
    )
    names=element.find_elements_by_class_name("fcab3ed991 a23c043802")
except:
    driver.quit()

Original Q&A

There are 1 best solutions below

**undetected Selenium** · Answer 1 · 2022-07-23T15:57:23.423000

To extract the texts from the name and price fields you can use list comprehension and you can use the following locator strategies:

Code block:

driver.execute("get", {'url': 'https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB&sid=8005d0cc6b75af8d0d2e74451b73cb8b&aid=304142&sb=1&sb_lp=1&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB%26sid%3D8005d0cc6b75af8d0d2e74451b73cb8b%26sb_price_type%3Dtotal%26%26&ss=Jumeirah%2C+Dubai%2C+Dubai+Emirate%2C+United+Arab+Emirates&is_ski_area=&checkin_year=2022&checkin_month=8&checkin_monthday=1&checkout_year=2022&checkout_month=8&checkout_monthday=3&group_adults=2&group_children=0&no_rooms=1&map=1&b_h4u_keep_filters=&from_sf=1&ss_raw=jum&ac_position=1&ac_langcode=en&ac_click_type=b&dest_id=941&dest_type=district&place_id_lat=25.205553&place_id_lon=55.239216&search_pageview_id=c0ac477da63f02c2&search_pageview_id=c0ac477da63f02c2&search_selected=true&ac_suggestion_list_length=5&ac_suggestion_theme_list_length=0&order=price#map_closed'})
names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[data-testid='title']")))]
prices = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[data-testid='price-and-discounted-price'] > span")))]
for i,j in zip(names, prices):
  print(f"{i} hotel price is {j}")

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console Output:

Royal Prestige Hotel hotel price is ₹ 10,871
Rove La Mer Beach hotel price is ₹ 10,328
Dubai Marine Beach Resort & Spa hotel price is ₹ 12,133
Roda Beach Resort hotel price is ₹ 16,525
Bespoke Residences - 3 Bedroom Waikiki Townhouses hotel price is ₹ 20,395
Walking distance to Burj al Arab - 1BR Lamtara 2 hotel price is ₹ 16,724
Mandarin Oriental Jumeira, Dubai hotel price is ₹ 18,108
Four Seasons Resort Dubai at Jumeirah Beach hotel price is ₹ 20,003
Bulgari Resort, Dubai hotel price is ₹ 78,274
Spacious Villa! hotel price is ₹ 62,619
Palm Beach Hotel hotel price is ₹ 64,794
York International Hotel hotel price is ₹ 86,971
Moon , Backpackers , Partition for Couples and for singles hotel price is ₹ 208,731
Hafez Hotel Apartments Al Ras Metro Station hotel price is ₹ 2,022
Grand Pearl Hostel For Boys hotel price is ₹ 2,131
Time Palace Hotel Branch hotel price is ₹ 3,131
Hostel Youth hotel price is ₹ 3,157
Grand Mayfair Hotel hotel price is ₹ 3,601
Explore Old Dubai, Souks, Tastings, Museums hotel price is ₹ 4,592
Panorama Hotel Bur Dubai hotel price is ₹ 3,674
Zain International Hotel hotel price is ₹ 3,827
Panorama Hotel Deira hotel price is ₹ 3,870
Decent Boys Hostel in center of Bur Dubai next to Burjuman metro Station with all FREE Facilities hotel price is ₹ 3,875
Brand New Boys Hostel 1 min walk from Burjuman Metro Station EXIT-4 with all Brand New Furnishings & Free Facilities hotel price is ₹ 3,914
OYO 338 Transworld Hotel hotel price is ₹ 3,914

PS: Following this solution you can similarly extract the location and link texts as well and dump in a JSON format.

Selenium Python extracting information from a website and dumping it into JSON Format

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in JSON

Related Questions in SELENIUM

Related Questions in WEBDRIVERWAIT

Related Questions in EXPECTED-CONDITION

Trending Questions

Popular # Hahtags

Popular Questions