Is it possible to blacklist elements, like a image url in a list, so that the program skip it in the next search and dont use it and search for the next image on the website? I tried this but he always take the already used elements again.
used = []
while True:
search = True
pic = driver.find_element(By.CSS_SELECTOR,value=".image-post img")
time.sleep(2)
pic_url = pic.get_attribute("src")
pic_title = pic.get_attribute("alt")
used.append(pic)
time.sleep(200)
#Second loop
while search:
pic = driver.find_element(By.CSS_SELECTOR, value=".image-post img")
if pic != used:
search = False
used.append(pic)
Another try.
while search:
pic = driver.find_element(By.CSS_SELECTOR, value=".image-post img")
if pic not in used:
search = False
used.append(pic)
he always get stuck at this point pic = article.find_element(By.CSS_SELECTOR, value='.post-container a img')
while True:
search = True
driver.switch_to.window(gagtab)
time.sleep(2)
driver.refresh()
time.sleep(2)
while search:
feed = driver.find_element(By.CSS_SELECTOR, "div.main-wrap section#list-view-2")
streams = feed.find_elements(By.CLASS_NAME, "list-stream")
for stream in streams:
# Find articles within the stream; these are the 'posts'
articles = stream.find_elements(By.TAG_NAME, "article")
for article in articles:
try:
# Find the article title
title = article.find_element(By.CSS_SELECTOR, "header > a")
except NoSuchElementException:
continue
for stream in streams:
articles = stream.find_elements(By.TAG_NAME, "article")
for article in articles:
try:
pic = article.find_element(By.CSS_SELECTOR, value='.post-container a img')
except NoSuchElementException:
continue
if pic.id in used:
continue
time.sleep(2)
pic_url = pic.get_attribute("src")
pic_title = pic.get_attribute("alt")
used.append(pic.id)
You're going about it in the right way but instead of storing the whole
WebElementlike you're doing (used.append(pic)), you should store the WebElement ID and then do your comparison on that; like so:Alternatively, you could store the URL you wish to skip instead of the WebElement ID.
You'll notice I used the
continuekeyword which allows us to stop executing the current iteration and move onto the next. This, along withbreakare two helpful keywords that you should learn for what you're doing. Here's a good tutorial about those.EDIT
You don't need to loop over the streams again, you can do everything you want in your first nested articles loop.
I think with this adjustment, you won't need to store and check the IDs because you won't be iterating over the same elements again.