Splinter Click on "Popup" Cookies Privacy Agreement + Splinter Wait For Element to Load?

284 Views Asked by At

---------- Hi, all!

Double question here.

I'm learning web scraping and am trying it out with a custom project for a game that I play. The url that I am trying to scrape from is this: https://paladins.guru/profile/4894277-TofuCookies/champions.

I run into two issues:

Issue 1: Splinter Can't Click "Privacy / Cookie Button"

enter image description here

There is this pop-up that appears everytime I open the webpage with my code. I clicked inspect on the button and tried .find_by_text() and .find_by_id() using classes to no avail. (The other .find_by's don't make sense in this context.) I think the issue here is that this pop-up's buttons are generated via Javascript, as well, and thus can't scrape it. This pop-up is NOT in a new window either.

Issue 2: How to "Wait" in Splinter?

enter image description here

So every Character block (highlighted in yellow) has its own accompanying advanced table of information that I want scrape (outlined in red). However, you can only view this advanced table one at a time, meaning if I wanted to view the table of the second character "Yagorath", I would need to click on the "Yagorath" div, which would hide the table corresponding to "Jenos" and reveal the "Yagorath" table.

In my code, I believe I have successfully clicked on the relevant div tags, but there is a loading time for that advanced table to show up on the page. In my code, I'm just speed clicking through the div, and since the tables never get a chance to load before I click on the next div, the scraper returns empty for all the values.

Here is my code in case anyone wants to check.

# Import dependencies
from splinter import Browser
from bs4 import BeautifulSoup as bs
from webdriver_manager.chrome import ChromeDriverManager

# Connecting to site
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

# Visiting site
url = 'https://paladins.guru/profile/4894277-TofuCookies/champions'
browser.visit(url)

# Fetch raw html, parse into BS object
soup = bs(browser.html, 'html.parser')

# Create storage variables for building df
all_df_headers = ['Champion']
all_champ_data = []
is_first_champion = True

# Click the privacy agree button
# browser.click_link_by_id('div[class=" css-47sehv"]')
# ------- NEED TO CLICK THE ACCEPT BUTTON HERE -----------

# Scrape every champion row from BS object
champ_rows = soup.find_all('div', class_ = 'row champion-table__row')
for champ in champ_rows:
    # Temp storage variables
    curr_champ_data = []

    # Scrape champion name
    curr_name = champ.find('div', class_ = 'row__champion__name').text
    curr_champ_data.append(curr_name)

    # Click to reveal statistics (but only if its not the first champion because site comes loaded with stats revealed for first champion)
    if is_first_champion == False:
        target = 'div[class="row champion-table__row"]'
        browser.find_by_tag(target).click()
        # ------- NEED A WAIT TIMER HERE ----------- 
    
    # Scrape every statistic per champion
    statistics = champ.find_all('div', class_ = 'column col-2 col-sm-4')
    for stat in statistics:
        # Scrape statistic header (but only if its the first champion)
        if is_first_champion == True:    
            curr_header = stat.find('div', class_='percentile-stat__label text-ellipsis text-uppercase').text.strip()
            if curr_header not in all_df_headers:
                all_df_headers.append(curr_header)

        # Scrap statistic value
        curr_value = stat.find('div', class_='percentile-stat__value c-help col-11').text.strip()
        curr_champ_data.append(curr_value)
    
    # Append finished champ data list to mega list
    all_champ_data.append(curr_champ_data)

    # Adjust boolean so no longer scrap header for all subsequent champions (since all champs will have same statistics headers)
    is_first_champion = False

# Close browser when done
browser.quit()
1

There are 1 best solutions below

1
childnick On

I can't answer your actual question but I can help you get the data you want... this was too long for a comment.

Playing around with that site I found some backend api's that will help you get the data you want firstly there is all the data for loads of stats: https://api.paladins.guru/v3/profiles/4894277-TofuCookies/champions

also there is a url for each champion (you'll need this to get the name from the code in the link above): https://api.paladins.guru/v3/champions

You can get the data you want like this:

import requests
import pandas as pd

champ_url = 'https://api.paladins.guru/v3/champions'
champs = requests.get(champ_url).json()

stats_url = 'https://api.paladins.guru/v3/profiles/4894277-TofuCookies/champions'
stats = requests.get(stats_url).json()

details = []
for champ in stats['champions']['-1']: #-1 is the total queue I think, there are 7 other queues, but they are encoded: 424, 428, 237, 452, 465, 469, 468
    info = stats['champions']['-1'][champ]['total']
    info['champion'] = champs[champ]['name'] #add in real name from champs data above

    details.append(info)

df = pd.DataFrame(details)
df.to_csv('palandin_data.csv', index=False)
print('Saved to paladins.csv')