I've been searching for a question related but I have had no luck as of yet. I am looking to transform a long list of independent variables for a regression analysis. A dummy dataset would look like this:
DATA TEST (DROP = i);
DO i = 1 to 4000;
VAR = i + 100000;
output;
end;
run;
PROC TRANSPOSE
DATA = TEST
OUT = TEST_T
(DROP = _NAME_)
PREFIX = X_;
ID VAR;
VAR VAR;
RUN;
DATA TEST_ARRAY;
SET TEST_T;
ARRAY X[*] X_:;
DO J = 1 TO 40;
DO I = 1 TO DIM(X);
X[I] = RANUNI(0)*I;
OUTPUT;
END;
END;
RUN;
In this case the variable names X_i are increasing monotonically, in reality, my variables are actually X_number where the number is a six digit unique identifier. I've been trying to log transform and square all these variables such that I have a new X matrix with the following columns
X_133456 X_SQ_133456 LOG_X_133456
I tried looping a list through all variables like this
PROC CONTENTS
DATA = TEST_ARRAY
OUT = CONTENTS;
RUN;
PROC SQL NOPRINT;
SELECT NAME INTO: REG_FACTORS
SEPARATED BY " "
FROM CONTENTS;
QUIT;
DATA WANT;
SET TEST_ARRAY;
%LET index = 1;
%DO %UNTIL (%SCAN(®_factors.,&index.," ")=);
%LET factors = %SCAN(®_factors.,&index.," ");
LOG_X_&FACTORS. = LOG(X_&FACTORS.);
X_SQ_&FACTORS. = (X_&FACTORS.) ** 2;
%LET index = %EVAL(&Index + 1);
%END;
RUN;
but this blows up my server and I need to find a more efficient way of doing this, thanks in advance
EDIT: for the contributor - I managed to solve at 13:04
%LET input_factors = X_:;
PROC SQL;
SELECT
NAME
, TRANWRD(NAME,%SCAN(&input_factors.,1,'_'),'SQ')
, TRANWRD(NAME,%SCAN(&input_factors.,1,'_'),'LOG')
INTO
:factor_list separated by " "
, :sq_factor_list separated by " "
, :log_factor_list separated by " "
FROM
contents
WHERE
VARNUM < 5
WHERE
NAME LIKE "%SCAN(&input_factors.,1,'_')_"
ORDER BY
INPUT(SCAN(NAME,-1,'_'),8.)
;
QUIT;
%PUT &factor_list.;
%PUT &sq_factor_list.;
%PUT &log_factor_list.;
Use 3 arrays, one for the input values (e.g.
X_31415), and two for the new calculated values (log and square).The trick is to dynamically generate the variable names for the calculated variables, based on the original variable names.
/* Use dictionary table to get/generate vnames */ proc sql ; select name, /* X_31415 */ tranwrd(name,'X_','X_SQ_'), /* X_SQ_31415 */ tranwrd(name,'X_','LOG_X_') /* LOG_X_31415 */ into :VARLIST separated by ' ', :SQLIST separated by ' ', :LOGLIST separated by ' ' from dictionary.columns where libname = 'WORK' and memname = 'MYDATA' and name like 'X_%' order by input(scan(name,-1,'_'),8.) /* order based on the numeric suffix */ ; quit ;Now you can assign three arrays, looping over the input values and calculate the square and log accordingly.
data array3 ; set mydata ; array in{*} &VARLIST ; /* X_1 X_17 X_31415 X_99999 */ array sq{*} &SQLIST ; /* X_SQ_1 X_SQ_17 X_SQ_31415 X_SQ_99999 */ array lg{*} &LOGLIST ; /* LOG_X_1 LOG_X_17 LOG_X_31415 LOG_X_99999 */ do i = 1 to dim(in) ; sq{i} = in{i} ** 2 ; lg{i} = log(in{i}) ; end ; drop i ; run ;