---------- Hi, all!
Double question here.
I'm learning web scraping and am trying it out with a custom project for a game that I play. The url that I am trying to scrape from is this: https://paladins.guru/profile/4894277-TofuCookies/champions.
I run into two issues:
Issue 1: Splinter Can't Click "Privacy / Cookie Button"
There is this pop-up that appears everytime I open the webpage with my code. I clicked inspect on the button and tried .find_by_text() and .find_by_id() using classes to no avail. (The other .find_by's don't make sense in this context.) I think the issue here is that this pop-up's buttons are generated via Javascript, as well, and thus can't scrape it. This pop-up is NOT in a new window either.
Issue 2: How to "Wait" in Splinter?
So every Character block (highlighted in yellow) has its own accompanying advanced table of information that I want scrape (outlined in red). However, you can only view this advanced table one at a time, meaning if I wanted to view the table of the second character "Yagorath", I would need to click on the "Yagorath" div, which would hide the table corresponding to "Jenos" and reveal the "Yagorath" table.
In my code, I believe I have successfully clicked on the relevant div tags, but there is a loading time for that advanced table to show up on the page. In my code, I'm just speed clicking through the div, and since the tables never get a chance to load before I click on the next div, the scraper returns empty for all the values.
Here is my code in case anyone wants to check.
# Import dependencies
from splinter import Browser
from bs4 import BeautifulSoup as bs
from webdriver_manager.chrome import ChromeDriverManager
# Connecting to site
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)
# Visiting site
url = 'https://paladins.guru/profile/4894277-TofuCookies/champions'
browser.visit(url)
# Fetch raw html, parse into BS object
soup = bs(browser.html, 'html.parser')
# Create storage variables for building df
all_df_headers = ['Champion']
all_champ_data = []
is_first_champion = True
# Click the privacy agree button
# browser.click_link_by_id('div[class=" css-47sehv"]')
# ------- NEED TO CLICK THE ACCEPT BUTTON HERE -----------
# Scrape every champion row from BS object
champ_rows = soup.find_all('div', class_ = 'row champion-table__row')
for champ in champ_rows:
# Temp storage variables
curr_champ_data = []
# Scrape champion name
curr_name = champ.find('div', class_ = 'row__champion__name').text
curr_champ_data.append(curr_name)
# Click to reveal statistics (but only if its not the first champion because site comes loaded with stats revealed for first champion)
if is_first_champion == False:
target = 'div[class="row champion-table__row"]'
browser.find_by_tag(target).click()
# ------- NEED A WAIT TIMER HERE -----------
# Scrape every statistic per champion
statistics = champ.find_all('div', class_ = 'column col-2 col-sm-4')
for stat in statistics:
# Scrape statistic header (but only if its the first champion)
if is_first_champion == True:
curr_header = stat.find('div', class_='percentile-stat__label text-ellipsis text-uppercase').text.strip()
if curr_header not in all_df_headers:
all_df_headers.append(curr_header)
# Scrap statistic value
curr_value = stat.find('div', class_='percentile-stat__value c-help col-11').text.strip()
curr_champ_data.append(curr_value)
# Append finished champ data list to mega list
all_champ_data.append(curr_champ_data)
# Adjust boolean so no longer scrap header for all subsequent champions (since all champs will have same statistics headers)
is_first_champion = False
# Close browser when done
browser.quit()


I can't answer your actual question but I can help you get the data you want... this was too long for a comment.
Playing around with that site I found some backend api's that will help you get the data you want firstly there is all the data for loads of stats: https://api.paladins.guru/v3/profiles/4894277-TofuCookies/champions
also there is a url for each champion (you'll need this to get the name from the code in the link above): https://api.paladins.guru/v3/champions
You can get the data you want like this: