First question here, I hope it makes sense how I write this out.
I am searching a massive lists of emails, and if they are found in google (I am in germany, thus the german in the strings) updating the email validity column in the dataframe to reflect it... but it is not saving. It prints correctly, but checking afterwords, it has not stored the iterated values.
# Script googling emails
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://google.de/search?q="Nicolas Cage"')
pyautogui.press('tab', presses=4)
pyautogui.press('enter')
df['email_validity'] = None
for email, domain_validity, email_validity in zip(df['email'], df['domain_validity'], df['email_validity']):
if domain_validity == True:
try:
driver.get(f'https://google.de/search?q="{email}" after:1990')
time.sleep(3) # loading url
"""pyautogui.hotkey('escape', presses=2)"""
time.sleep(2)
if 'die alle deine Suchbegriffe enthalten' not in driver.page_source and 'übereinstimmenden Dokumente gefunden'not in driver.page_source and 'Es wurden keine Ergebnisse gefunden' not in driver.page_source:
email_validity = True
print(email_validity)
elif 'not a robot' in driver.page_source:
print('help me!')
input("write anything, and press enter:")
else:
email_validity = False
print(email_validity)
except:
print(email)
else:
email_validity = domain_validity
driver.close()
print('completed')
df.head()
You haven't updated
dfin the loop. Your variablesemail,domain_validity, andemail_validitycontain the values from the tuple returned byzip(). Changing them does not modify the dataframe.df.at
You need to update the dataframe using
df.atat the end.df.apply()
You could also extract your email validation check to a separate function, and use
apply()on the whole column instead of looping. You can remove theif domain_validity == True:check and use that as a lambda function onapplyinstead.That might not be straightforward for you since the
'not a robot'case needs to be handled and return a value.