When executed, the code below takes a long time.
for index,row in d_airbnb.iterrows(): #Loop for para varrer cada linha.
latitude = row['latitude']
longitude = row['longitude']
location = geo_locator.reverse((latitude, longitude),language='en', exactly_one=True,timeout=5) #Usamos o metodo geolocator.reverse() para descobrir se cada zipcode é válido.
address = location.address
if row['neighbourhood'] not in address:
d_airbnb.at[index, 'zipcode'] = '00000' #Caso o zipcode não corresponder ao bairro que o locator.reverse me forneceu pela API, então o zipcode é invalido e inconsistente. Por isso trocamos por "00000".
I tried increasing the timeout to 2 seconds, but it still takes a long time and produces some timeout errors:
WARNING:urllib3.connectionpool:Retrying
(Retry(total=1, connect=None, read=None, redirect=None, status=None))
after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443):
Read timed out. (read timeout=1)")'
What causes this, and how to treat it?
Is there another way to check these zipcodes faster? There are 70'000 lines and 29 columns in my dataset.
Requirements are specified that do not allow for intensive use: maximum of 1 request per second (click). So, you can set sleep to
1 secondand store data to prevent re-downloading of data you have already obtained.There are two more options (click).