A multiple regression with number of days from start of observations gives me a plot in which the confidence limits are shown for the entire range of x-values of the curve. When I change the ticks and labels on the plot to specific dates the CL do not extend to the first or last points. How can this be corrected?
Full code that gives plot against days is this:
frpd<-read_csv("frpd.csv")
ggplot(frpd, aes(x=Days, y=cover))+geom_point()
frpd.shuffled<-frpd[sample(nrow(frpd)),]
K<-10
degree<-5
folds<-cut(seq(1,nrow(frpd.shuffled)),breaks=K, labels=FALSE)
mse=matrix(data=NA,nrow=K,ncol=degree)
for(i in 1:K){
testIndexes <- which(folds==i,arr.ind=TRUE)
testData <- frpd.shuffled[testIndexes, ]
trainData <- frpd.shuffled[-testIndexes, ]
for (j in 1:degree){
fit.train = lm(cover ~ poly(Days,j), data=trainData)
fit.test = predict(fit.train, newdata=testData)
mse[i,j] = mean((fit.test-testData$cover)^2)
}
}
colMeans(mse)
best = lm(cover ~ poly(Days,2, raw=T), data=frpd)
summary(best)
lm(formula=cover~poly(Days,2,raw=T),data=frpd)
ggplot(frpd, aes(x=Days, y=cover)) +
geom_point() +
stat_smooth(method='lm', formula = y ~ poly(x,2), size = 1) +
xlab('Days') +
ylab('cover')
This produces the following plot: Image from plot with Days
To get a plot against dates I replaced the geom-plot code with this:
geom_point() +
stat_smooth(method = 'lm', formula = y ~ poly(x, 2), size = 1) +
scale_x_continuous(breaks = c(15, 74, 135, 196, 257, 318),
labels = c("1 Feb", "1 Apr", "1 Jun", "1 Aug", "1 Oct", "1 Dec",)) +
scale_y_continuous(limits = c(0, 100), breaks = seq(0, 100, 20)) +
ylab('cover') +
xlab('date')
This is the resulting plot: Image of plot against dates
I tried adding breaks at c(1) and c(362) and matching labels "" and "". It did not extend the range of CL's.