I've got a large data frame in R consisting millions of unique addresses in a column (like 725 E 8TH ST).
In an effort to standardize addresses, I want to replace all the short street suffix abbreviations to their long form (changing E to East and ST to Street).
I am not looking to validate the addresses.
I found a package called campfin that has a tibble of many predefined abbreviations and their long forms (see usps_street).
I've tried to use the normal_address and the expand_abbrev functions to do this, but have been unsuccessful. Specifically, I tried normal_address(df$Address, abbs = usps_street) and expand_abbrev(df$Address, abb = usps_street). However, these are not replacing the abbreviations and the Address column is remaining unchanged.
Does anyone know how to properly use these functions? Or are there any suggestions for replacing these things? I don't want to defined short-forms (like abb = c("ST" = "STREET") ) because there are too many of them. Thus, I thought usps_street would be useful.
Here is a snippet of some of the address I'm looking at:
structure(list(Address = c("2771 NORTH THOMPSON RD NE", "10 BROOKWAY COURT",
"1000 WESTCLIFF AVE", "10009 NORTHEAST 145TH ST", "10021 PINECREST DRIVE",
"1003 WEST VICTORIA STREET"), Unit = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
City = c("ATLANTA", "MANSFIELD", "SAGINAW", "JONES", "PROVIDENCE VILLAGE",
"BRADY"), Country = c(NA, "US", "US", "US", "US", "US"),
Zip = c(30319, 76063, 76179, 73049, 76227, 76825), State = c("GA",
"TX", "TX", "OK", "TX", "TX")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
It looks like you're trying to do the inverse operation that the
usps_streetdata frame was created for. For example, look at the abbreviations for"ST":This is designed to normalise in the following ways:
However, you want to go the other way, meaning you have to select which of the longer values to map to.
I am going to assume that you want the longest value, so STREET in this case. We need to create a data frame where the mapping is:
We can do this fairly straightforwardly:
For STREET this looks like:
You can then use the
campfin::expand_abbrev()function: