I keep on going around in circles trying to bootstrap confidence intervals for my data. I've only got very rudimentary knowledge about stats and am having trouble adapting the code such as here.
My aim is to be able to predict the mean, confidence intervals, and sd for n values (say, 300) along the x range of the data (ie. from 27.05575 to 144.75700, but can truncate the data if needed for the bootstrapping processes).
Sample code for generating the loess.
# create a data frame
df <- data.frame(
DBH = c(27.05575, 30.10165, 41.36365, 48.31459, 64.64380, 64.88845, 65.55535, 75.12160, 79.40695, 113.27850, 114.68800, 120.68150, 125.24300, 130.27200, 132.17600, 144.75700),
length = c(0.0000000, 0.0000000, 0.0000000, 0.0000000, 1.5056656, 0.4686661, 1.5143648, 1.2282208, 0.3701741, 19.2412440, 51.3086010, 33.4588765, 254.6009090, 35.0538617, 59.5713370, 195.1270735),
normalised = c(0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.005913827, 0.001840787, 0.005947995, 0.004824102, 0.001453939, 0.075574137, 0.201525600, 0.131416956, 1.000000000, 0.137681605, 0.233979278, 0.76640368)
)
model <- loess(normalised ~ DBH, data= df, span = .8)
xrange <- range(subData$DBH)
xseq <- seq(from=xrange[1], to=xrange[2], length=300)
pred <- predict(model, newdata = data.frame(DBH = xseq), se=TRUE)
yfit = pred$fit
predictionDataFrame <- data.frame(xseq, yfit) %>%
rename(DBH = xseq, normalised = yfit)
ggplot(data = predictionDataFrame, aes(x = DBH, y = normalised)) +
geom_line(size = 2) +
geom_point(data = df, aes(x = DBH, y = normalised)) +
theme_bw()
Side note - I'd prefer a less smooth curve, but as there are some gaps in my data, I run into some weirdness when I use a lower smoothing parameter. Ie this is the curve for 0.6:
Besides from the 'span' parameter, are there other ways to control the loes? Changing the other parameters doesn't seem to do much. However, using the loess.boot function from the spatialEco package, the fitted curves seem more targeted than just the raw loess function with 0.8 smoothing. This last image is a comparison of a couple of different measurements of mine using the loess.boot function from spatialEco (thick lines) and the loess function (dashed lines). I'd prefer not to rely on that package and go through the process manually so I understand what's happening.



As commented by Gregor Thomas, you have to put your code for fitting the model and getting prediction in functions. It is then relatively straight forward to use e.g.
tidymodelsto apply bootstrap resampling to estimate uncertainty. (Though I give no guarantees that these estimates of uncertainty are statistically sound for whatever inference you will try to use them for.)Here is an example where I've taken your code for fitting the model and making predictions as verbatim as possible from question and made them into functions, and then used a
tidymodelsapproach to estimate the model and make predictions on 10k bootstrap samples:EDIT 23-04-03: Here is an example on how to extract the SD from each point.
We can extract the predictions for each point from each bootstrap sample by unnesting our
predscolumn. (Thepredscolumn is just a list of data frames, one for each bootstrap sample -- so to extract the predictions you could use any method of row binding them together, such asboots_preds <- do.call("rbind", boots$preds)).If we then group by
term, which denotes the point in the range, and usesummarize(), we can summarize whatever you want at each point in the range --- including the standard deviation, mean, median, and so on. For instance: