filtering from group that satisfies group criteria

56 Views Asked by At
structure(list(Year = c(2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2006, 2006, 2006, 2006, 2006, 2006, 2006), CompanyID = c(5, 
5, 5, 6, 6, 7, 7, 7, 5, 5, 5, 6, 6, 7, 7), Age = c(55, 48, 62, 
62, 41, 62, 55, 63, 55, 48, 62, 62, 41, 62, 55), Gender = c("M", 
"M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "F", "F", "M", 
"F"), JOIN = c(1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1), 
    LEAVE = c(0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L))

From the above data frame (which is a small sample of a large dataset)- I would like to filter rows satisfying the following two conditions -

  1. In each group (group_by (Year, CompanyID), Each year (Year variable) a firm (which is CompanyID) must appoint (JOIN variable) only one female director (GENDER variable) to replace a departing (LEAVE variable) male director (GENDER variable) in the year

AND

  1. departing male director to be older than 60 (Age variable)

The expected output is -

tribble(
  ~"Year", ~"CompanyID", ~"Age", ~"Gender", ~"JOIN", ~"LEAVE",
  2005,        6,          62,      'M',       0,       1,
  2005,        6,          41,      'F',       1,       0,
  2006,        7,          62,      'M',       0,       1,
  2006,        7,          55,      'F',       1,       0
)

Because the expected output satisfies the above two criteria - that each year each company replace  a departing male director with a female director and the departing male director age is greater than 60.
2

There are 2 best solutions below

1
MeminT99 On
library(tidyverse)
library(magrittr)

data %<>%
  group_by(Year, CompanyID) %>%
  filter( (LEAVE == 1) & (Gender == "M") & (Age > 60) | (JOIN == 1) & (Gender == "F") )
0
UMUTCANKILIC On

Below code works.

Here, I first filtered the youngest woman and oldest man as potential ones. Then, I find the instances where there are both a potentially leaving person and a potentially entering person. For example, in potential_appointments matrix, there are 6 rows. However, company 7 does not have anyone in 2005 to replace the leaving person, so this should be excluded from the list. I attach the screenshot of the output.

Output Image

potential_appointments <- df %>%
  group_by(Year, CompanyID) %>%
  filter(Gender == "F" & Age == min(Age) | Gender == "M" & Age == max(Age)) %>%
  filter((((Gender == "M") & (LEAVE == 1) & (Age > 60))) | ((JOIN == 1) & 
(Gender == "F")))

valid_candidates <- 
potential_appointments[duplicated(potential_appointments[c("Year", 
"CompanyID")]) | duplicated(potential_appointments[c("Year", "CompanyID")], 
fromLast = TRUE),]
valid_candidates