How do I blacklist elements from finding element?

92 Views Asked by At

Is it possible to blacklist elements, like a image url in a list, so that the program skip it in the next search and dont use it and search for the next image on the website? I tried this but he always take the already used elements again.

used = []


while True:
    search = True 
    pic = driver.find_element(By.CSS_SELECTOR,value=".image-post img")
    time.sleep(2)
    pic_url = pic.get_attribute("src")
    pic_title = pic.get_attribute("alt") 
    used.append(pic)
    time.sleep(200)

#Second loop
        while search:
            pic = driver.find_element(By.CSS_SELECTOR, value=".image-post img")
            if pic != used:
                search = False

    used.append(pic)

Another try.

while search:
    pic = driver.find_element(By.CSS_SELECTOR, value=".image-post img")
    if pic not in used:
        search = False


used.append(pic)

he always get stuck at this point pic = article.find_element(By.CSS_SELECTOR, value='.post-container a img')

while True:
    search = True
    driver.switch_to.window(gagtab)
    time.sleep(2)
    driver.refresh()
    time.sleep(2)
    while search:
        feed = driver.find_element(By.CSS_SELECTOR, "div.main-wrap section#list-view-2")
        streams = feed.find_elements(By.CLASS_NAME, "list-stream")
        for stream in streams:
            # Find articles within the stream; these are the 'posts'
            articles = stream.find_elements(By.TAG_NAME, "article")
            for article in articles:

                try:
                    # Find the article title
                    title = article.find_element(By.CSS_SELECTOR, "header > a")

                except NoSuchElementException:
                    continue

        for stream in streams:

            articles = stream.find_elements(By.TAG_NAME, "article")
            for article in articles:

                try:

                    pic = article.find_element(By.CSS_SELECTOR, value='.post-container a img')

                except NoSuchElementException:
                    continue
        if pic.id in used:
            continue
    time.sleep(2)
    pic_url = pic.get_attribute("src")
    pic_title = pic.get_attribute("alt")
    used.append(pic.id)
1

There are 1 best solutions below

5
Lucan On

You're going about it in the right way but instead of storing the whole WebElement like you're doing (used.append(pic)), you should store the WebElement ID and then do your comparison on that; like so:

# Id will be a unique ID for the WebElement
used.append(pic.id)
...
# Elsewhere in your code
# Check if the ID is in the used list
if myElement.id in used:
   # If it is, we continue the loop (skip this iteration) or you can return (exit the block)
   continue

Alternatively, you could store the URL you wish to skip instead of the WebElement ID.

You'll notice I used the continue keyword which allows us to stop executing the current iteration and move onto the next. This, along with break are two helpful keywords that you should learn for what you're doing. Here's a good tutorial about those.

EDIT
You don't need to loop over the streams again, you can do everything you want in your first nested articles loop.

for stream in streams:
    articles = stream.find_elements(By.TAG_NAME, "article")
    for article in articles:
        try:
            title = article.find_element(By.CSS_SELECTOR, "header > a")
            # Add more code here that needs the `article` WebElement
            pic = article.find_element(By.CSS_SELECTOR, value='.post-container a img')
            ...
        except NoSuchElementException:
            continue

I think with this adjustment, you won't need to store and check the IDs because you won't be iterating over the same elements again.