I have a longitudinal dataset in R comprising several countries observed over multiple time points. Let's simplify things and consider the following example
set.seed(123)
df=data.frame(Country=c(rep("DEU",16),rep("FRA",16),rep("ITA",16)),Year=rep(c(rep(1,4),rep(2,4),rep(3,4),rep(4,4)),3),industry=rep(c("A","B","C","D"),12),h_emp=rnorm(48,15,3.5))
The objective is to create a new row for each country and year, always labeled in the industry column as "C+D". The corresponding cell in h_emp should be equal to the sum of the values for h_emp in industries "C" and "D" for that country in that specific year. How can I achieve this?
Using dplyr, create a summarized df including sums for
C+D, then bind back to your original df. Note your example data has multiple entries for some industries in each year/country; I assumed this was an error so I created new sample data.Result: