I'm currently trying to use the Levenberg Marquardt optimisation method with Keras. In the attached file, I tried to link Keras to "scipy.optimize.least_squares".
My goal is to retrieve the very simple relationship y = 2*x out of a training dataset.
In the function "model_residuals", I compute the residuals using both Keras (residual list "r") and a simple linear regression model (residual list "r2"). When the function "model_residuals" returns "r2" (linear regression model), it converges very well towards the optimal solution. However, when the function "model_residuals" returns "r" (Keras), it doesn't converge at all.
I was able to check that the residual lists "r" and "r2" are always almost exactly the same.
Why does my program converge for "r2" and not for "r"?
Could someone please tell me what went wrong?
I copied the whole code below.
Best wishes.
import sys
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import *
from scipy.optimize import least_squares
from copy import deepcopy
# Data generation.
np.random.seed(0); n_samples = 20;
X = np.linspace(-1, 1, n_samples)
y = 2 * X + np.random.randn(n_samples) * 0.1; # The equation / model we want to retrieve is Y = 2X.
# Construction of the Keras ANN model.
model = Sequential()
model.add(Dense(1, input_dim=1, use_bias=False, activation='linear'))
###########################################
# Function which returns the residuals as a list.
def model_residuals(params, x, y):
print(" params0 = ",params);
weights = deepcopy(params); # In order not to change "params".
weights = weights.reshape((1, 1))
model.layers[0].set_weights([weights]); # Set the weights of the model.
current_params = model.get_weights()[0].flatten()
print(" weights in model : ",params);
r = model.predict(x).flatten() - y; # Residuals computed using the Keras model.
r2 = params[0]*x-y; # Residuals computed using a simple linear regression model.
print(" r ",r); print(" r2 ",r2); print("\n","\n","\n");
# The residual list r and r2 are always almost the same for any parameter value!
squared_error = np.mean(r**2);
print("Squared Error:", squared_error);
squared_error2 = np.mean(r2**2);
print("Squared Error2:", squared_error2);
# Likewise, the squared errors are almost always the same for any parameter value!
return r
########################################
init_params = model.get_weights()[0].flatten() # Weights 1D-Array
init_params = [-9]; # Let us always initialise the coefficient of the simple linear model by the value -2 as it is more challenging for the ANN to retrieve it.
print(" init_params = ",init_params);
max_nfev = 1000 # Maximal number of iterations.
result = least_squares(model_residuals, init_params, args=(X, y), method='lm', xtol=1e-6, max_nfev = max_nfev, jac='2-point'); # Fitting the coefficient to the data using either the simple linear regression model (in which case the function "model_residuals" must return 'r2') or the ANN from Keras (in which case the function "model_residuals" must return 'r')
print("\n","\n"," result = ",result,"\n","\n");
# Set the optimised weights.
opt_weights = [result.x.reshape((1, 1))]
print("\n","\n"," opt_weights = ",opt_weights,"\n","\n");
model.layers[0].set_weights(opt_weights); # Set the optimised weights into the model.
# Prediction with the optimised model.
X_test = np.linspace(-1, 1, 40);
y_test = 2 * X_test;
y_pred = model.predict(X_test);
# Plot the quality of the fit.
plt.scatter(X_test, y_test, color='blue', label='Data')
plt.plot(X_test, y_pred.flatten(), color='red', label='Fit')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()```
That was a good one:) The default datatype in the model is float32. Change that to float64 and the code works as intended.