How can I fix this invalid type error using lm()?

777 Views Asked by At
Error in model.frame.default(formula = data$conservationstatus ~ data$latitude,  : 
  invalid type (NULL) for variable 'data$conservationstatus

I have a dataset (called data, after reading a CSV file), and it has the columns Conservation Status and Latitude. I'm trying to perform linear regression on these two using

lm(data$ConservationStatus ~ data$Latitude, data = data)

However, I keep getting the error above. It seems like it's because my column has two words in in it. I've tried data$Conservation Status, data$'Conservation Status', data$Conservation.Status, but nothing seems to work :(

2

There are 2 best solutions below

0
On BEST ANSWER

We can specify the formula without data$. If the column name have spaces, use backquotes to wrap the column name

model <- lm(`Conservation Status` ~ Latitude, data = data)

It can be reproduced with a simple example

data(iris)
lm(iris$epal.Length ~ iris$Species, iris)

Error in model.frame.default(formula = iris$epal.Length ~ iris$Species, : invalid type (NULL) for variable 'iris$epal.Length'

and using the correct syntax

lm(Sepal.Length ~ Species, iris)

#Call:
#lm(formula = Sepal.Length ~ Species, data = iris)

#Coefficients:
#      (Intercept)  Speciesversicolor   Speciesvirginica  
#            5.006              0.930              1.582  
0
On

You are likely mis-spelling a variable name, check colnames(data) to see how it is spelled.

For instance,

lm(mtcars$MPG ~ mtcars$disp)
# Error in model.frame.default(formula = mtcars$MPG ~ mtcars$disp, drop.unused.levels = TRUE) : 
#   invalid type (NULL) for variable 'mtcars$MPG'

lm(mtcars$mpg ~ mtcars$disp)
# Call:
# lm(formula = mtcars$mpg ~ mtcars$disp)
# Coefficients:
# (Intercept)  mtcars$disp  
#    29.59985     -0.04122  

Noting that mpg is lower-case, not UPPER case as I tried the first time.

colnames(mtcars)
#  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

I agree with @akrun's suggestion to instead use lm(mpg ~ disp, data=data), as the nomenclature (to me) seems more readable, but it will throw a different error in that case, as you found out.