How to do exponential curve fitting for y = a*(1- exp(-b*t)

217 Views Asked by At

I have time series data where values increase over time (t) and then become stable: I would like to fit an exponential curve NPQ_Lss = a*(1- exp(-b*t). Later I would like to compare the fitted curves between populations (P1 and P2) to find out if they are significantly different.

How could I do so in R? I understand the limitation that I have few data points.

My data looks like this:

t NPQ_Lss Population
-2 1.777 P2
0 2.224 P2
3 2.492 P2
5 2.384 P2
7 2.407 P2
-2 1.719 P1
0 2.191 P1
3 2.418 P1
5 2.303 P1
7 2.403 P1
2

There are 2 best solutions below

1
Ben Bolker On BEST ANSWER

In principle,

library(nlme)
fit <- gnls(NPQ_Lss ~ SSasympOrig(t, a, log_b),
            data = DF,
            params = a + log_b ~ Population)

should fit the model with population-specific parameters. (SS stands for "self-start", so you shouldn't need to provide starting values if your data are reasonably well-behaved.)

But, it actually doesn't: the main problem here is that your data are actually a terrible match for the curve a*(1-exp(-b*t)), which assumes that the curves passes through y=0 at time 0. (Your curve is most of the way to its asymptote at time t=0.)

Fitting to the data as a whole with SSasymp, which doesn't assume the curve goes through the origin, works OK:

fit <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
            data = DF)

Unfortunately this specification, which should work, doesn't (" supplied 3 starting values, need 6")

fit2 <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
            data = DF,
            params = a + y0 + log_b ~ Population)

Let's try giving it starting values

fit2 <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
            data = DF,
            params = a + y0 + log_b ~ Population,
            start = c(rbind(coef(fit), 0)))

The start value is a sneaky way of setting the starting values to {a, 0, y0, 0, log_b, 0}, knowing that this model will be parameterized according to {"a for population 1", "difference in a between pops", "y0 for P1", "y0 diff", "log_b for P1", "log_b diff"} ...

Comparing the models with those fitted by @G.Grothendieck (using method = "plinear", which is a little more robust):

newdf <- expand.grid(Population = c("P1", "P2"), t = seq(-3, 10, length = 51))
df_pred <- (list(orig_0=fm1, orig_pop = fm2, flex_0 = fit, flex_pop = fit2)
    |> purrr::map_dfr(~ data.frame(newdf, NPQ_Lss = predict(., newdata = newdf)), .id = "model")
)

library(ggplot2); theme_set(theme_bw())
ggplot(df_pred,
       aes(t, NPQ_Lss, colour = Population)) +
    geom_line(aes(linetype = model)) +
    geom_point(data = DF)
ggsave("SO77667833.png")

enter image description here

The model that insists on fitting a curve through (0,0) gives a prediction (the concave-up/accelerating curve) that is technically correct (it's the least-squares fit of that model to these data), but ridiculous; the model (SSasymp) that doesn't constrain the fit to go through the origin gives reasonable results.

summary() of the two-population model (fit2) will give you the p-values for the individual parameter differences; anova(fit, fit2) as suggested by @G.Grothendieck will test the differences in the parameters jointly.

2
G. Grothendieck On

Using the reproducible data in the Note at the end, we found that the original model did not fit well so we will use a 3 parameter self starting logistic model SSlogis as shown instead.

fm1 and fm2 are the models for populations P1 and P2. fm3 is the model for separate fits for each population and fm4 is the model with a single fit to both populations.

We can use anova to test if fm3 and fm4 are significantly different.
With a p value of 0.732 they are not significantly different.

DF$Population <- factor(DF$Population)  # important

fm1 <- nls(NPQ_Lss ~ SSlogis(t, Asym, mid, scal), DF, subset = Population == "P1")
fm2 <- update(fm1, subset = Population == "P2")

fo <- NPQ_Lss ~ Asym[Population] / 
  (1 + exp((mid[Population] - t)/scal[Population]))
st <- Map(c, coef(fm1), coef(fm2))
fm3 <- nls(fo, DF, start = st) # fit each pop separately
fm4 <- update(fm1, subset = TRUE) # single fit to both

anova(fm3, fm4)
## Analysis of Variance Table
## 
## Model 1: NPQ_Lss ~ Asym[Population]/(1 + exp((mid[Population] - t)/scal[Population]))
## Model 2: NPQ_Lss ~ SSlogis(t, Asym, mid, scal)
##   Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
## 1      4   0.018712                             
## 2      7   0.025006 -3 -0.0062939  0.4485  0.732

plot(NPQ_Lss ~ t, DF, col = Population, pch = 20, cex = 2)
lines(fitted(fm3) ~ t, DF, subset = Population == "P1", col = 1)
lines(fitted(fm3) ~ t, DF, subset = Population == "P2", col = 2)
lines(fitted(fm4) ~ t, DF, subset = Population == "P1", col = 3)

Dots are data. Lines are fitted values. Black is P1. Red is P2. Green is single fit to both populations (fm4).

screenshot

Old

This is the original answer but after looking at it it seems not to provide a good fit.

Assuming the data shown reproducibly in the Note at the end, fit each with separate parameters and then fit both with the same parameters and then do an anova. It gives a P value of 0.998 which is not significant so we can use the same parameters for both.

# separate parameters
DF$Population <- factor(DF$Population)
fo1 <- NPQ_Lss ~ a[Population]*(1 - exp(-b[Population]*t))
fm1 <- nls(fo1, DF, start = list(a = c(-2, -2), b = c(-0.2, -0.2)))

# same parameters
fo2 <- NPQ_Lss ~ cbind(a = (1 - exp(-b*t)))
fm2 <- nls(fo2, DF, start = list(b = -0.2), alg = "plinear")

anova(fm2, fm1)

## Analysis of Variance Table
##
## Model 1: NPQ_Lss ~ cbind(a = (1 - exp(-b * t)))
## Model 2: NPQ_Lss ~ a[Population] * (1 - exp(-b[Population] * t))
##   Res.Df Res.Sum Sq Df   Sum Sq F value Pr(>F)
## 1      8     25.136                           
## 2      6     25.134  2 0.001791   2e-04 0.9998

Note

DF <- data.frame(
  t = rep(c(-2L, 0L, 3L, 5L, 7L), 2),
  NPQ_Lss = c(1.777, 2.224, 2.492, 2.384, 2.407, 1.719, 2.191, 2.418, 2.303, 2.403),
  Population = rep(c("P2", "P1"), each = 5L)
)