Flag accounts on the basis of reapplication logic ( case of multiple entries by one person)

54 Views Asked by At

I'm doing a case study for which I need some help.

Background - There are a bunch of people who have applied for Visa and have made multiple applications through different channels. I have to find out if a person has reapplied within 30 days of their previous application (Y/N), their reapplied channel and date. Every entry should be considered independent irrespective of multiple entries by a single person. [See attached image] Ex - PETE5O made first application on 5th Aug but second was not made within 30 days hence is it is not reapplied but the third was made within 30 days of second and hence it is reapplied and also the channel of reapplication would be channel of third application here (which is basically reapplication), same goes with date, so and so forth. There can be n number of applications made by a single person. Also, there can be chances of multiple reapplications. Please advise what should I do. I have info in yellow but want to get the info in blue.

enter image description here

Code -

data have;
 format apply_date date9.;
 input Id $ Channel $ Apply_date date9. ;
 cards;
 SAM1D Online 1-Oct-22
 SAM1D Kiosk 9-Oct-22
 PETE5O Office 5-Aug-22
 PETE5O Kiosk 6-Sep-22
 PETE5O Online 8-Sep-22
 PETE5O Kiosk 5-Oct-22
 ;

Thanks in advance.

Expecting results in blue to be populated. [See attached image]

3

There are 3 best solutions below

0
Tom On BEST ANSWER

To calculate based on future actions process the data in reverse chronological order. That way you can use LAG() to remember those future events.

proc sort data=have ;
  by id descending apply_date;
run;
data want;
  set have;
  by id descending apply_date;
  length reapply_flag $3 reapply_channel $6 reapply_date 8;
  format reapply_date date9.;
  lag_date=lag(apply_date);
  lag_channel=lag(channel);
  reapply_flag='No';
  if first.id then call missing(of reapply_channel reapply_date);
  else if lag_date - apply_date <= 30 then do;
    reapply_flag='Yes';
    reapply_channel=lag_channel;
    reapply_date=lag_date;
  end;
  drop lag_: ;
run;

Result:

                               Apply_    reapply_    reapply_     reapply_
Obs      Id      Channel         date      flag      channel          date

 1     PETE5O    Kiosk      05OCT2022      No                            .
 2     PETE5O    Online     08SEP2022      Yes        Kiosk      05OCT2022
 3     PETE5O    Kiosk      06SEP2022      Yes        Online     08SEP2022
 4     PETE5O    Office     05AUG2022      No                            .
 5     SAM1D     Kiosk      09OCT2022      No                            .
 6     SAM1D     Online     01OCT2022      Yes        Kiosk      09OCT2022
1
Баба Капа капочка On

format apply_date date9.; input Id $ Channel $ Apply_date date9. ; cards; SAM1D Online 1-Oct-22 SAM1D Kiosk 9-Oct-22 PETE5O Office 5-Aug-22 PETE5O Kiosk 6-Sep-22 PETE5O Online 8-Sep-22 PETE5O Kiosk 5-Oct-22 ;

0
Richard On

You can use a 1:1 merge of the data with itself, the second one being offset by one row (firstobs=2) and using data set option rename= to create variables for the 'lead' condition.

Example:

 data want;
   merge
     have 
     have(firstobs=2 rename=(id=nextid channel=nextchannel apply_date=nextapply_date))
  ;

  if id = nextid then do;
    if nextapply_date - apply_date <= 30 then do;
      reapply_flag = 'Y';
      reapply_channel = nextchannel;
      reapply_date = nextapply_date;
      reapply_interval = nextapply_date - apply_date;
    end;
  end;
  else do;
    reapply_flag = 'N';
  end;

  format reapply_date date9.;
  drop next:;
run;