I have a panel data for companies between years 2001-2019. I need to clean this dataset so that I drop all firms which have a missing data in one of the explanatory variables for regression. The problem is that I only need to drop whole firm if there is a gap in data. If the firm has data for 2003-2007 I need to keep it. Only if it has data for example from 2001 to 2003 and than 3 years missing and than again have all data from 2007 to 2019 I need to drop it. In this second case there is a gap and I need to drop all observations for that firm (even if this happens only for one variable).
You will help me a lot if you have any ideas. I tried a lot, but could not find a way how to do this.
I tried this, but does not work properly:
firms_with_gaps <-
panel_data %>%
group_by(company_id) %>%
filter(!(all(year >= 2001 & year <= 2019))) %>%
pull(company_id)
Thank you very much for your ideas!
Instead of using all, you should use any to check if there are any years missing within the specified range for each firm. updated code
or