I am new to this platform. I am trying to account for the heteroscedasticity in my multiple linear regression model using weighted least squares estimation in R.
To do this, I am taking the residuals from my OLS regression, square them and regress them on my predictors. The inverses of these predicted squared residuals are my weights for the WLS regression.
# Fit the OLS model
model_fff <- lm(x.fff ~ x.ang + cef + x.int + pet + eid + x.opn + x.agr
+ s.age + s.gen + s.edu + s.pap
+ bhc_trans,
data = df_cleaned,
na.action = "na.exclude")
# Obtain squared residuals from the OLS model
residuals_squared_fff <- residuals(model_fff)^2
# Fit a linear model to obtain weights based on squared residuals
weights_lm_fff <- lm(residuals_squared_fff ~ x.ang + cef + x.int + pet + eid + x.opn + x.agr
+ s.age + s.gen + s.edu + s.pap
+ bhc_trans,
data = df_cleaned,
na.action = "na.exclude")
# Extract the weights and assign them to the dataframe
df_cleaned$gewichte_fff <- fitted.values(weights_lm_fff)
# Fit the WLS model using the weights
model_fff_wls <- lm(x.fff ~ x.ang + cef + x.int + pet + eid + x.opn + x.agr
+ s.age + s.gen + s.edu + s.pap
+ bhc_trans,
data = df_cleaned,
weights = 1/df_cleaned$gewichte_fff,
na.action = "na.exclude")
However, I am receiving this error message:
Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :
missing or negative weights not allowed
I thought that setting na.action = "na.exclude" would deal with the missings but it seem like this is not the case?
When checkings for NAs, residuals_squared_fff and gewichte_fff do contain NAs. I thought of omitting then but that does not work because then the variable lenghts differ.
Error in model.frame.default(formula = residuals_squared ~ x.ang + cef + :
variable lengths differ (found for 'x.ang')pe here
Any ideas on what I should do?