set up the initial guess values for curve_fit based on I/O

59 Views Asked by At

I am new in optimization and would like to write code that will determine the values of a1, a2, b1, b2 for a predefined function f_model based on the available inputs X1_data X2_data and output Y_list.

The code will be used on multiple datasets with available I/O and must meet 3 constraints:

  1. Levenberg-Marquardt algorithm must be used i.e., curve_fit(..., method='lm',...) (based on scipy documentation bounds can not be used with method='lm')
  2. 1 < b1 < a1 < a2 < b2
  3. y=((a2+(a1-a2)/(1+((x1/x2/b2)**b1)))*x2)/1000

Here is my code:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def f_model(x_data, b1, a1, a2, b2):
    x1, x2 = x_data
    if 1 < b1 < a1 < a2 < b2:
        return 1e-3 * (a2 + (a1 - a2) / (1 + ((x1 / x2 / b2) ** b1))) * x2
    else:
        return np.inf

X1_data = ...
X2_data = ...
Y_list = ...
X_axis = [i + 1 for i in range(X1_data.shape[0])]

for Y_data in Y_list:
    Y_data = np.array(Y_data)
#    if np.min(Y_data) > 1:
#        b1_guess = 1.01
#        a1_guess = np.min(Y_data)
#        a2_guess = np.max(Y_data) * 10
#        b2_guess = np.mean(X1_data / X2_data)
#        guess = [b1_guess, a1_guess, a2_guess, b2_guess]
#    else:
#        guess = [1.01, 100.0, 700.0, 10000.0]
    guess = [3.5, 100.0, 7300.00, 30000.00]

#    bounds = ([1.0, guess[0], guess[1], guess[2]], [np.inf, np.inf, np.inf, np.inf])
    popt, pcov = curve_fit(f_model, (X1_data, X2_data), Y_data, p0=guess, method='lm', maxfev=5000)
    print(f"Guess {guess}")
    print(f"Optimal {popt}")
    optimal_curve = f_model((X1_data, X2_data), *popt)
    plt.scatter(X_axis, Y_data, label='Data')
    plt.plot(X_axis, optimal_curve, label='Fitted Curve')

plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.legend()
plt.show()

The problem is that I couldn't figure out how to properly set the values for guess based on I/O. By changing the values of guess, I got several popt, from which it is difficult to determine the optimal one.

I have dataset1 to validate the code (the values of a1, a2, b1, b2 are known). Whenever I try different combinations of guess, the curve fits the data well but popt does not provide b1=3.98; a1=98.4; a2=7337.03; b2=29404.80.

If anyone could help me edit the code so that for dataset1, popt will be b1 = 3.98; a1 = 98.4; a2 = 7337.03; b2 = 29404.80. I will be happy!

ps. As soon as, code will be validated, I would like to use it for dataset2 as well.

dataset1 Y_list updated (13/12/2023)

X1_data = np.array([2894648.773, 3157919.698, 3474257.841, 3800193.356, 4156493.087, 4590298.639, 5042658.177])
X2_data = np.array([67.28579923, 69.52170573, 71.68019425, 73.68463411, 75.5179478, 77.1563269, 78.52232434])
Y_list = [[411.1179, 441.5084, 471.0992, 497.1712, 520.8332, 542.983, 560.8622]]
b1=3.98; a1=98.4; a2=7337.03; b2=29404.80

dataset2

X1_data = np.array([240228, 259164, 267785, 300530, 339812])
X2_data = np.array([32.188719, 32.710433, 33.243463, 33.80985, 34.353048])
Y_list = [[23.79969728, 25.34667811, 26.88872834, 30.86784396, 34.04270604],
          [2.70371836, 3.33359208, 3.49283256, 3.9231444, 4.55001784],
          [0.65156594, 0.66139181, 0.50957883, 0.46295474, 0.51407824]]
0

There are 0 best solutions below