How to geocode a table of invalid/incorrect locations in R?

517 Views Asked by At

I have collected data of different users' location from twitter. I am trying to plot those data in a map in R. The problem is users have given invalid/incorrect addresses which causes geocode function to fail. How can I avoid this failure? Is there any way to check for this error case and not proceed? For example the user location data is something like this for any file geocode9.csv.

available locations, Buffalo, New York, thsjf, Washington, USA Michigan, nkjnt, basketball, ejhrbvw

library(ggmap)
fileToLoad <- file.choose(new = TRUE)
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
geocoded <- data.frame(stringsAsFactors = FALSE)
for(i in 1:nrow(origAddress))
{

  result <- geocode(origAddress$available_locations[i], output = "latlona", source = "google")
  origAddress$lon[i] <- as.numeric(result[1])
  origAddress$lat[i] <- as.numeric(result[2])
  origAddress$geoAddress[i] <- as.character(result[3])

}
write.csv(origAddress, "geocoded.csv", row.names=FALSE)

When the code runs through "thsjf" of the locations list, it throws an error. How can I get past this error? I want something like, if(false){ # do not run geocode function}

1

There are 1 best solutions below

0
ASH On

I'm not sure how to geocode those addresses if they are actually wrong. How would the machine even figure it out if it was wrong? I think you need to get the addresses corrected, and THEN geocode everything. Here is some sample code.

#load ggmap
library(ggmap)

startTime <- Sys.time()

# Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)


# Read in the CSV data and store it in a variable 
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)


# Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)


# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
for(i in 1:nrow(origAddress))

{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}


# Write a CSV file containing origAddress to the working directory
write.csv(origAddress, "geocoded.csv", row.names=FALSE)

endTime <- Sys.time()
processingTime <- endTime - startTime
processingTime

Check this for more info.

http://www.storybench.org/geocode-csv-addresses-r/