Length of find_elements_by_xpath is incorrect

43 Views Asked by At

In the link below, I am trying to collect the total number of games in the 2023-24 Regular Season table.

https://www.basketball-reference.com/players/j/jokicni01/gamelog/2024

I have the variable for those elements set to total_games. My issue is when I do print(len(total_games), I am getting an output 113.

total_games = driver.find_elements_by_xpath('//tbody/tr[@id and @data-row]')
print(len(total_games))

I have manually inspected the elements on the page and done a search for //tbody/tr[@id and @data-row] and even in the search results it shows only 66 entries (accurate as of Mar 19, 2024, will increase as the season continues but should never exceed 82). Can anyone tell me where all of those extra entries are coming from when I run this in PyCharm?

I have also tried using total_games = driver.find_elements(By.XPATH, '//tbody/tr[@id and @data-row]') but I get the same result. I have also tried making it more specific with the following two lines but when those are used PyCharm returns a length of 0 for total_games. In both of those cases when inspecting the page manually, the correct results are returned.

total_games = driver.find_elements(By.XPATH, '//table[@id="pgl_basic"]/tbody/tr[@id and @data-row]')

and

total_games = driver.find_elements(By.XPATH, '//tbody/tr[contains(@id, "pgl_basic") and @data-row]')
2

There are 2 best solutions below

0
maxpower8888 On BEST ANSWER

So this was a weird one. The URL was correct, but for some reason even though you could see the script going to the correct page, when it came time to collect those elements, it was still taking them from the previous page. I added a WebDriverWait function to make it wait for a specific element on the page I needed before collecting the elements and now it works.

0
chitown88 On

It's because there are like 8 tables in the html. A far better way to do this is grab the stats tables, then get the max value in 'G' column if you want the players number of games, or just the 'RK' column or length of the table for total games.

import pandas as pd

url = 'https://www.basketball-reference.com/players/j/jokicni01/gamelog/2024'
df = pd.read_html(url)[-1]
df = df[df['G'].ne('G')]

print(len(df))