Iteration algorithm in R for least squares estimation of high dimensional regressions

56 Views Asked by lisbonscot At 18 February 2024 at 17:11

I have a matched employer-employee data set with 20 years of information on around 2.000.000 individuals and 400.000 firms. Amongst several worker- and firm-level covariates, I have worker, firm and occupation identifiers. My objective is to obtain 3 vectors of worker, firm and occupation fixed effects, estimated through a wage equation. I want to implement the Gauss-Seidel algorithm to obtain the exact least squares solution.

In a nutshell, I need to start with the three fixed effects' variables equal to zero, followed by estimation of the wage equation and update of each of the three coefficients sequentially until convergence is met, based on some convergence criterion (in this case, I'm following Guimaraes & Portugal 2010 by using a comparison between the residual sum of squares). I tried to do it in stata but it was taking too long to converge, so I'm now trying to do it in R.

My R code:

estimate_fixed_effects = function(df, max_iter = 1000, tol = 1e-6) {
     y = df$real_lrganho
     X = model.matrix(~ p_age2 + antig + antig2 + firm_age + lfprod, df)
     beta = rep(0, ncol(X))
     worker_fe = rep(0, length(unique(df$ntrab)))
     firm_fe = rep(0,length(unique(df$emp_id)))
     occupation_fe = rep(0,length(unique(df$occupation4)))
     for (iter in 1:max_iter){
         for (i in 1:length(worker_fe)) {
             y_i = y[df$ntrab == i]
             X_i = X[df$ntrab == i,]
             beta = lm(y_i - X_i %*% beta ~ 0)$coefficients
             worker_fe[i] = mean(y_i - X_i %*% beta)
             }
         for (j in 1:length(firm_fe)) {
             y_j = y[df$emp_id == j]
             X_j = X[df$emp_id == j,]
             beta = lm(y_j - X_j %*% beta ~ 0)$coefficients
             firm_fe[j] = mean(y_j - X_j %*% beta)
             }
         for (k in 1:length(occupation_fe)) {
             y_k = y[df$occupation4 == k]
             X_k = X[df$occupation4 == k,]
             beta = lm(y_k - X_k %*% beta ~ 0)$coefficients
             occupation_fe[k] = mean(y_k - X_k %*% beta)
             }
         beta_new = lm(y - rowSums(model.matrix(~ ntrab + emp_id + occupation4, df) %*% c(worker_fe, firm_fe, occupation_fe)) ~ X - 1)$coefficients
         if (max(abs(beta_new - beta))<tol) {
             break
             }
         beta = beta_new
         }
     return(list(beta = beta, worker_fe = worker_fe, firm_fe = firm_fe, occupation_fe = occupation_fe))
     }

The output should ideally be 3 vectors of estimated fixed effects for the 3 identified groups. However, after running result = estimate_fixed_effects(df), I got the error message "Error in X[df$ntrab == i, ] : (subscript) logical subscript too long". When trying to make sense of this output, I realize that I'm not entirely sure of how model.matrix() even works, and how to make the function do what I want. Any ideas?

Edit: I fixed the first error message, it had to do with a mismatch between the model matrix and df$ntrab, because of missing values on lfprod. Now I'm facing the following error which I don't understand: Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'mean': non-conformable arguments. Any ideas?

Original Q&A

Iteration algorithm in R for least squares estimation of high dimensional regressions

There are 0 best solutions below

Related Questions in R

Related Questions in ITERATION

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Related Questions in LEAST-SQUARES

Trending Questions

Popular # Hahtags

Popular Questions