I have a character variable that contains values that only include characters, values that only contain numeric, and other values that contain a combination of both numeric and alpha characters. I have included a small list of potential variable values below.
1811
1826
1st airport
1000 islands
1111
: Heathrow
9928
: Seattle
AC2277
I am trying to recode values that only contain numerics as "NA" (i.e obs 1, 2, 5, 7), and I was wondering if anyone had any idea on how this can be done? The dataset I am working with is quite large (observations in the millions), so manually re-coding this variable based on the proc freq outputs can be quite exhaustive.
Any tips you would have to resolve this issue, would be very much appreciated!
I am unaware of any data steps that can do this request. I did not want to use the starts with or ends with number statement, as the middle characters could include alpha characters.
You can do this with regex, but it's much easier to do with the
inputfunction. We'll useinputto try and convert the string into a number by checking if it follows thew.informat. If it returns a non-missing value, then we know it's a number. If it's a number, we'll replace the string withNA.Data:
Code:
Output: