Creating lag variable using for loop

35 Views Asked by At

What I want to perform: If hmonth=2 and hyear=2000, subtract each observation of wageratio.female from that of hmonth=1 and hyear=2000. If hmonth=2 and hyear=2001, subtract each observation of wageratio.female from that of hmonth=1 and hyear=2001. Repeat for all hmonth and hyear. Create a variable called wageratio.lags for the differences.

Below is a small section of my attempt at for loop. Should I be using for loop to achieve my desired output?

differences = list()

for i in range(len(hmonth)):
    # Check if the current pair is (2, 2000) or (2, 2001)
    if hmonth[i] == 2:
        if hyear[i] == 2000:
            # Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2000
            difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
            differences.append(difference)
        elif hyear[i] == 2001:
            # Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2001
            difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
            differences.append(difference)
Error: unexpected symbol in "for i"

Desired output:

hmonth hyear wageratio.female wageratio.lags
1 2000 -0.43 -0.01
1 2001 0.18 -0.62
2 2000 -0.44 0.12
2 2001 -0.44 -0.47
3 2000 -0.32 -0.45
3 2001 -0.91 0.70
4 2000 -0.77 1.24
4 2001 -0.21 NA
5 2000 0.47 NA
df <- data.frame(
  wageratio_female = c(-0.43, 0.18, -0.44, -0.44, -0.32, -0.91, -0.77, -0.21, 0.47),
  hmonth = c(1, 1, 2, 2, 3, 3, 4, 4, 5),
  hyear = c(2000, 2001, 2000, 2001, 2000, 2001, 2000, 2001, 2000)
 )
1

There are 1 best solutions below

2
MrFlick On BEST ANSWER

you can use the dplyr lead/lag functions to do this without a loop. For example

library(dplyr)
df %>% 
  group_by(hyear) %>% 
  arrange(hmonth) %>% 
  mutate(wageratio.lags = lead(wageratio_female) - wageratio_female) %>%
  ungroup()

produces

  wageratio_female     hmonth      hyear    wageratio.lags
             <dbl> <hvn_lbll> <hvn_lbll>   <dbl>
1            -0.43          1       2000 -0.0100
2             0.18          1       2001 -0.62  
3            -0.44          2       2000  0.12  
4            -0.44          2       2001 -0.47  
5            -0.32          3       2000 -0.45  
6            -0.91          3       2001  0.7   
7            -0.77          4       2000  1.24  
8            -0.21          4       2001 NA     
9             0.47          5       2000 NA