I'm working with a dataset of tree species which are mostly found along coastal areas in Southeast Asia in order to produce a species distribution model to look for novel surveying sites in New Guinea. After filtering the data to remove spatial/temporal/geographic biases, I have a rather small dataset of 70 occurences, and when I run the data in biomod2 (or ecospat) in R, 45/70 of my occurences are removed due to NA in my ISRIC soil data layers (phh2o, N, silt).
For a visual, you can see in qgis my occurence points which occur on land in my elevation layer (light green), but appear to be in the ocean in my soil layer (gray). occurence points on mapmore occurence points on map.
Is there any methodology for predicting/filling these NA areas on my raster data? How have other people handled working with soil data in coastal areas?
I have already made sure my data is aligned, cropped, and projected properly. I have also thought to just reduce my dataset to only using the occurence points in presence areas, by running an ensemble of small models (Breiner 2015, 2018). But, I am still worried how representative the model will be if I have to exclude many occurence points in coastal areas.
Any thoughts or suggestions would be much appreciated!
`# 2013 WDC Soils
#load soils --------
filelist_temp <- list.files(path = "map/", full.names = TRUE)
soil30 <- rast(filelist_temp)
res(soil30)
ext(soil30)
cec1k <- rast("C:/intsia/cec_0-5cm_mean_1000.tif")
#reproject via bilinear interpolation
cec <- project(cec1k, soil30)
res(cec)
ext(cec)
#stack soils
soil <- c(cec, soil30)
#crop to sea extent
soil_crp <- crop(soil, e.bij)
#aggregate
soil5 <- aggregate(soil_crp, fact=5, fun = "mean", progress="text")
rm(soil, cec, cec1k, soil30, soil_crp)
#check crs
crs(soil5[[1]]) == crs(wind5[[1]]) #true
#check ext
ext(soil5[[1]]) == ext(wind5[[1]]) #true`