Trying to remove "ZCTA" from rows

39 Views Asked by At

I am trying to extract only the zip code values from my imported ACS data file, however, the rows all include "ZCTA" before the 5 digit zip code. Is there a way to remove that so just the 5 digit zip code remains?

Example:

Image of data frame with ZCTA and Zip

I tried using strtrim on the data but I can't figure out how to target the last 5 digits. I image there is a function or loop that could also do this since the dataset is so large.

2

There are 2 best solutions below

0
KacZdr On

To remove "ZCTA5":

gsub("ZCTA5", "", df$zip) # df - your data.frame name

or

library(stringr)
str_replace(df$zip,"ZCTA5","")

To extract ZIP CODE:

str_sub(df$zip,-5,-1)
0
AndS. On

Here is a few others for fun:

#option 1
stringr::str_extract(df$zip, "(?<=\\s)\\d+$")

#option 2
gsub("^.*\\s(\\d+)$", "\\1", df$zip)