Connection to HTTPSConnectionPool broken by ReadTimeoutError (Retrying)

76 Views Asked by At

When executed, the code below takes a long time.

for index,row in d_airbnb.iterrows(): #Loop for para varrer cada linha.
    latitude = row['latitude']
    longitude = row['longitude']
    location = geo_locator.reverse((latitude, longitude),language='en', exactly_one=True,timeout=5) #Usamos o metodo geolocator.reverse() para descobrir se cada zipcode é válido.
    address = location.address
    if row['neighbourhood'] not in address:
        d_airbnb.at[index, 'zipcode'] = '00000' #Caso o zipcode não corresponder ao bairro que o locator.reverse me forneceu pela API, então o zipcode é invalido e inconsistente. Por isso trocamos por "00000".

I tried increasing the timeout to 2 seconds, but it still takes a long time and produces some timeout errors:

WARNING:urllib3.connectionpool:Retrying 
(Retry(total=1, connect=None, read=None, redirect=None, status=None)) 
after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): 
Read timed out. (read timeout=1)")'

What causes this, and how to treat it?
Is there another way to check these zipcodes faster? There are 70'000 lines and 29 columns in my dataset.

1

There are 1 best solutions below

0
Karol Oleksy On

Requirements are specified that do not allow for intensive use: maximum of 1 request per second (click). So, you can set sleep to 1 second and store data to prevent re-downloading of data you have already obtained.

There are two more options (click).

  1. Use other available commercial third-party providers
  2. Install your instance of Nominatim (not recommended for your case in my opinion)