How to convert numeric values to NA in SAS for an alphanumeric variable

83 Views Asked by At

I have a character variable that contains values that only include characters, values that only contain numeric, and other values that contain a combination of both numeric and alpha characters. I have included a small list of potential variable values below.

1811
1826
1st airport 
1000 islands
1111
: Heathrow
9928
 : Seattle 
AC2277

I am trying to recode values that only contain numerics as "NA" (i.e obs 1, 2, 5, 7), and I was wondering if anyone had any idea on how this can be done? The dataset I am working with is quite large (observations in the millions), so manually re-coding this variable based on the proc freq outputs can be quite exhaustive.

Any tips you would have to resolve this issue, would be very much appreciated!

I am unaware of any data steps that can do this request. I did not want to use the starts with or ends with number statement, as the middle characters could include alpha characters.

3

There are 3 best solutions below

0
Stu Sztukowski On

You can do this with regex, but it's much easier to do with the input function. We'll use input to try and convert the string into a number by checking if it follows the w. informat. If it returns a non-missing value, then we know it's a number. If it's a number, we'll replace the string with NA.

Data:

data have;
    input string$15.;
    datalines;
1811
1826
1st airport 
1000 islands
1111
: Heathrow
9928
 : Seattle 
AC2277
;
run;

Code:

data want;
    set have;
    if(input(string, 8.) NE .) then string = 'NA';
run;

Output:

string
NA
NA
1st airport
1000 islands
NA
: Heathrow
NA
: Seattle
AC2277
0
floydn On

In addition to Stu's answer, you can also use regex and notdigit functions.

data want;
set have;

* notidigit() > 0 if anything other than digit is found;
if not notdigit(strip(string)) then notdigit = 'NA';

* use regex to determine if entire string is digit(s);
if prxmatch('/^[\d]+$/', strip(string)) then regex = 'NA';
run;
string          notdigit    regex
1811            NA          NA
1826            NA          NA
1st airport     
1000 islands        
1111            NA          NA
: Heathrow      
9928            NA          NA
: Seattle       
AC2277      
0
freya On

As we know,compress can remove specific character.Modifier "d" will remove all the figure in string.Think it in reverse.If a string being compressed with letter "d" is null,which means it is a string only containing figure.

data have;
    input string$15.;
    datalines;
1811
1826
1st airport 
1000 islands
1111
: Heathrow
9928
 : Seattle 
AC2277
;
run;

code:

data want;
  set have;
  if compress(string,,"d") = "" then string="NA";
run;

if you got decimals or negative numbers,using compress(string,".-","d") will help.