SAS-BASE--- How sum up a value with the following one on specific conditions

46 Views Asked by At

I have this table:

![enter image description here

I would like to add two columns using SAS base :

  • _idx_new that aggregates class having perc_tot_1<=1 with the next ones untill the new percentage is greater than 1
  • the perc_tot_2 having the new percentages

Here the output table:

enter image description here

I tried with this SAS code to compute idx_new:

data test1;
 set dataset;
 by _idx;
 perc_tot = _freq/&tot.;
 perc_tot_1 = _freq/&tot.*100;
 retain _idx_new 0;
 if perc_tot_1 < 1 and lag1(perc_tot_1)>=1 then do;
    _idx_new = _idx_new+1;
 end;
 if perc_tot_1 < 1 and lag1(perc_tot_1)<1 then do;
    _idx_new = _idx_new;
    a=1;
 end;
run;

But it doesn't work properly. And I don't know how to compute the perc_tot_2.

Could you help me? Thanks!

3

There are 3 best solutions below

0
Tom On

I suspect you are working too hard. Since you didn't provide any data let's just make some up.

data have;
  row+1;
  input percent;
cards;
30
0.5
0.4
1.2
40
0.1
0.2
0.9
6
;

Now just process the data and when either the current percent or the previous percent is BIG then increment the new grouping variable. You also want to increment on the first observation, even if the percent is small (otherwise the group numbers will count from zero instead of one).

data want;
  set have;
  if (_n_=1) or (percent>1) or (lag(percent)>1) then idx+1;
run;

Result

Obs    row    percent    idx

 1      1       30.0      1
 2      2        0.5      2
 3      3        0.4      2
 4      4        1.2      3
 5      5       40.0      4
 6      6        0.1      5
 7      7        0.2      5
 8      8        0.9      5
 9      9        6.0      6

If you then want to aggregate the little percentages you could just summarize by the new grouping variable.

proc summary data=want;
  by idx;
  var percent;
  output out=collapsed(drop=_type_) sum=;
run;

Result:

Obs    idx    _FREQ_    percent

 1      1        1        30.0
 2      2        2         0.9
 3      3        1         1.2
 4      4        1        40.0
 5      5        3         1.2
 6      6        1         6.0
0
Quentin On

I interpreted the requirement slightly different than Tom, this will give different results. If there are three consecutive values for percent that are less than 1 (say .5, .6, .2) this approach will put .5 and .6 into one group, and .2 into another group, because .5 and .6 sum to > 1. Tom's approach would put them all into one group because they are all < 1. It's not clear to me which you intend. I used your data, but added four records at the end to illustrate this.

Using data:

data have;
  idx++1;
  input percent;
cards;
35.9
0.08
16.5
0.04
0.04
2.5
1.4
2.2
0.2
40.1
0.74
0.3
35
0.5
0.6
0.2
;

I used a "Double-DOW" loop. This reads the data in groups, starting a new group each time the running sum of percent is > 1.

data want ;
  idx_new++1 ;
  do until(percent_new>1 or last) ;
    set have end=last;
    percent_new=sum(percent_new,percent) ;
  end ;
  do until(_percent_new>1) ;
    set have ;
    _percent_new=sum(_percent_new,percent) ;
    output ;
  end ;
  drop _: ;
run ;

The result is:

                             percent_
idx    percent    idx_new       new

  1     35.90         1        35.90
  2      0.08         2        16.58
  3     16.50         2        16.58
  4      0.04         3         2.58
  5      0.04         3         2.58
  6      2.50         3         2.58
  7      1.40         4         1.40
  8      2.20         5         2.20
  9      0.20         6        40.30
 10     40.10         6        40.30
 11      0.74         7         1.04
 12      0.30         7         1.04
 13     35.00         8        35.00
 14      0.50         9         1.10
 15      0.60         9         1.10
 16      0.20        10         0.20

The double-DOW loop is an advanced technique. It may be simpler to do it in two steps, as Tom suggested. First make IDX_NEW:

data want1 ;
  set have ;
  if _n_=1 or percent2>1 then do ;
    idx_new++1 ;
    percent2=0 ;
  end ;
  percent2++percent ;
  drop percent2 ;
run ;

Then you can aggregate by idx_new:

proc sql ;
  create table want2 as
  select *,sum(percent) as percent_new 
  from want1
  group by idx_new
  order by idx
  ;
quit ;
0
Richard On

Sequential DOW loop is a useful tool for computing and then applying a groupwise computation.

Example:

data want ;
  do _n_ = 1 by 1 until (grp_percent_n > 1 or end) ;
    set have end=end ;
    grp_percent_n = sum(grp_percent_n, perc_tot) ;
  end ;
  grp_id + 1 ;
  do within_grp_index = 1 to _n_ ;
    set have ;
    within_group_pct = perc_tot / grp_percent_n * 100 ;

    OUTPUT ;
  end ;
run ;