how do you test for collinearity of raster files in R studio

226 Views Asked by At

I have inputted bioclimatic variables into R studio in which I wish to test them for collinearity so that I can remove highly correlated variables for completion of a species distribution model. I have managed to stack the raster files so far but haven't gotten any further and cannot find a clear explanation on how to run a vif for raster's. any help would be appreciated.

I have tried making a list, however I cannot seam to get it to work, as when I test for a vif of the list created it states the file needs to be in a file. type

2

There are 2 best solutions below

0
Sean McKenzie On

I think that you can test a stacked raster data set for multicollinearity by taking a large sample of pixels, converting them to a dataframe, creating a linear model, and testing for collinerality using vif() as normal. Personally, I prefer working with the terra:: package as it has largely replace the raster:: package. Below is a reproducible example. I have included in the comments the alternative functions in raster:::

##Loading Necessary Packages##
if(!require(terra)){install.packages("terra")}
if(!require(car)){install.packages("car")}

#Setting Random Number Generator Seed For Reproducibility##
set.seed(94)

##Creating a blank raster template##
blank<-rast(ymin=45.5, ymax =45.75, xmin = -122.75, xmax=-122.5, ncols=400, nrows=400, crs="EPSG:4326") #alternative raster::raster()

##creating a raster of one predictor variable following a uniform distribution between 0 and 5000 to emulate elevation
x1<-blank
values(x1)<-runif(160000, 0, 5000)

##creating a raster of a second predictor variable ~N(mu=150, sigma=85) to emulate cation exchange capacity
x2<-blank
values(x2)<-rnorm(160000, 150, 85)

##Creating a raster of a third predictor variable heavily collinear with x1 to emulate precipitation
x3<-blank
values(x3)<-17+0.85*values(x1) + rnorm(160000, 0, 50)

##Creating a response variable ~N(mu=100, sigma=255) to emulate species abundance 
y<-blank
values(y)<-rnorm(160000, 1000, 255)

##Combining all rasters into a single multilayer raster
rst<-c(y, x1, x2, x3) #alternatively raster::stack() or raster::brick()
names(rst)<-c("abundance", "elevation", "cec", "precipitation")

##Taking a sample of 10000 pixels without replacement
smp<-spatSample(rst, 10000, replace=FALSE, na.rm=TRUE, as.df=TRUE)#Alternatively raster::sampleRandom()

##Creating a linear model of abundance as a function of elevation cation exchange capacity and precipitation
mod<-lm(abundance~., data=smp)

##Testing for multicollinearity using the Variance Inflation Factor
vif(mod)
0
Robert Hijmans On

To get the correlation between rasters you can use terra::layerCor.

library(terra)
b <- rast(system.file("ex/logo.tif", package="terra"))   
layerCor(b, "pearson")$pearson
#            red     green      blue
#red   1.0000000 0.9980961 0.9501633
#green 0.9980961 1.0000000 0.9658011
#blue  0.9501633 0.9658011 1.0000000