Scraping information in a span located under nested span

31 Views Asked by At

I want to get live weather data with web scraping. I was thinking about using BeautifulSoup for this.

<span class="Column--precip--3JCDO">
  <span class="Accessibility--visuallyHidden--H7O4p">Chance of Rain</span>
  3%
</span>

I want to get the 3% out of this container. I already managed to get data from the website using this code snippet for another section.

temp_value = soup.find("span", {"class":"CurrentConditions--tempValue--MHmYY"}).get_text(strip=True)

I tried the same for the rain_forecast

rain_forecast = soup.find("span", {"class": "Column--precip--3JCDO"}).get_text(strip=True)

But the output my console is delivering is -- for print(rain_forecast). The only difference I can see is that between the "text" that should be gotten from the span there is another span.

Another way I came across on Stack Overflow is to use Selenium, because the data has not yet been loaded into the variable and therefore the output is --.

But I don't know if this is overkill for my application, or if there is an simpler solution for this problem.

2

There are 2 best solutions below

1
Andrej Kesely On BEST ANSWER

If you want to get table of today forecast you can use this example:

import pandas as pd
import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}

url = "https://weather.com/en-IN/weather/today/l/a0e0a5a98f7825e44d5b44b26d6f3c2e76a8d70e0426d099bff73e764af3087a"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

today_forecast = []
for a in soup.select(".TodayWeatherCard--TableWrapper--globn a"):
    today_forecast.append(
        t.get_text(strip=True, separator=" ") for t in a.find_all(recursive=False)
    )

df = pd.DataFrame(
    today_forecast, columns=["Time of day", "Degrees", "Text", "Chance of rain"]
)

print(df)

Prints:

  Time of day Degrees                 Text          Chance of rain
0     Morning    11 °        Partly Cloudy                      --
1   Afternoon    20 °        Partly Cloudy                      --
2     Evening    14 °  Partly Cloudy Night  Rain Chance of Rain 3%
3   Overnight    10 °               Cloudy  Rain Chance of Rain 5%
0
Muhammad Ahsan Iqbal On
from bs4 import BeautifulSoup

# Assuming you have your HTML content in 'html_content'
soup = BeautifulSoup(html_content, 'html.parser')

# Find the parent span and extract the text, excluding the nested span's text
rain_forecast = soup.find("span", {"class": "Column--precip--3JCDO"}).contents[-1].strip()

print(rain_forecast)