I have problem related to curve_fit in Python. Specifically, I am trying to model double exponential data.
I am trying to model data. The fits are good by the eye, but when I look at the parameters and errors I am shocked. The errors are huge (few orders greater then the mean of the parameter). Besides that I have to provide very specific initial guesses, otherwise it is useless. Do you have any tip?
X-axis is time and Y-axis is intensity. This method works fine for the blue plot and the estimated parameters are not that huge (reasonable more or less). I am facing problem when try to fit it to orange data. Here are the dataset:
Time [-0.5 -0.4 -0.3 -0.2 -0.1 0. 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.27 0.3 0.33 0.36 0.39 0.42 0.45 0.48 0.51 0.54 0.57 0.6 0.63 0.66 0.69 0.72 0.75 0.78 0.81 0.84 0.87 0.9 0.93 0.96 0.99 1.02 1.05 1.08 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. ]
Gain (blue) [0. 0.08362 0.05064 0.0498 0.03706667 0.27215333 1.47517 1.62150333 1.72074667 1.85051667 1.81913 1.89994 1.96591667 1.98604667 2.08784333 2.07969333 2.09007 2.06054333 2.1477 2.08343 2.08434 2.02111 1.96873 1.96056333 1.89549667 1.82184333 1.70469 1.47502667 1.30257667 1.09063333 0.95496333 0.87091333 0.73064667 0.59721333 0.53186333 0.47225 0.41566667 0.36025 0.31752 0.30499667 0.26868 0.21355667 0.19169 0.17645 0.14872667 0.10706667 0.12724667 0.1013 0.08720333 0.09477333 0.09134 0.10958 0.08022667 0.09363 0.08076667 0.09006 0.09902 0.07093333 0.09918 0.10192 0.09432333 0.09079333 0.10425 0.09839667]
Loss - orange [ 0. -0.12552667 -0.15764 -0.19444 -0.26223 -0.35566333 -1.2022 -1.30988667 -1.44820667 -1.49741333 -1.66169333 -1.72277667 -1.81840667 -1.92772333 -1.97253333 -2.10094667 -2.21817667 -2.24875333 -2.43420667 -2.47216 -2.50555667 -2.62232333 -2.65796667 -2.68747333 -2.69612667 -2.71442 -2.59840333 -2.56071667 -2.44539333 -2.28093667 -2.23737 -2.09984333 -2.07465667 -2.04232667 -1.95166667 -1.93045333 -1.89963333 -1.92825667 -1.98861667 -1.93057667 -2.00662333 -1.91338667 -1.92764 -2.03428 -2.04880667 -1.98233667 -1.90731333 -2.00939667 -1.92832 -1.93799 -1.89133 -1.90807 -1.97912 -1.99034667 -1.95203333 -1.85398667 -1.86426667 -1.85729667 -1.86018667 -1.92405 -1.88595667 -1.88160667 -1.82243333 -1.87084667]
How to deal with double exp fitting?
Plots: https://ibb.co/mCs39VP
Model:a*e^(-x*b)+c*e^(-x*d)+e
Paramters (for lower fit): [-1622.6092323 -15.038877 1713.45499496 -15.33871455
-1.92259733]
Errors [4.73773551e+06 4.17637511e+02 4.73736168e+06 4.37257698e+02
1.20927662e-02
Plots:
Here is my code for the orange data fit:
strong text
{def exp(x, a, b, c, d, e):
return a*np.exp(b*x)+c*np.exp(d*x)+e
n=25
y_l=y_l[n:]
x_l=x_l[n:]
par, cov=opt.curve_fit(exp,x_l, y_l, p0=[-5, -7, 30, -20, -1.9], bounds=((-np.inf, -np.inf,-np.inf,-np.inf, -np.inf ),(np.inf, 0, np.inf, 0, np.inf)), maxfev=5000)
print(par)
parerr=np.sqrt(np.diag(cov))
print("Errors",parerr)
mx_l=np.linspace(np.min(x_l), np.max(x_l), 100)
my_l=exp(mx_l, par[0], par[1], par[2], par[3], par[4])
plt.plot(mx_l, my_l, label="fit", color="blue")
plt.show()}



TL;DR
You are facing at least three issues:
x <= 0);MCVE
Lets build a MCVE to highlight challenges of your fitting problem.
For the model you have referenced we have the following function:
Which is not trivial to fit because it is redundant (there is two identical exponential effect in it). Meaning the solver will be confused when performing loss gradient descent starting from unit point.
We generate some similar data to you dataset:
Notice your parameters set does not define a convergent function because both exponential are increasing. This solution certainly cannot fit your data and therefore their uncertainties are huge.
We generate some synthetic dataset:
Here is the key of the optimization, we only tweak one initial parameters to be negative in order to help the solver to discriminate between the two exponentials and optimize it each against the other:
We can see, in this case, the convergence is optimal (we recover parameters with appreciable errors on it).
Fitness is also correct:
Other models
Increase order of one of the expontentials
Another model of interest for points
x>=0, with 4 parameters only could be the following:Which give some fitness for gain curve while having no problem to converge:
But clearly lack of fit for loss curve. On the other hand, it fits seamlessly because there are two orthogonal functions instead of two copies of the same function.
Sum of logistic functions
@JJacquelin proposal clearly offer better fitness for the complete dataset:
It only requires an educated guess to discriminate both effects (sum of identical function):