Fitting a Gaussian to a probability distribution to find the standard deviation, in python using matplotlib

221 Views Asked by At

Plot of one distribution

I'm quite new to coding so please be patient. I'm trying to model different probability distributions, and I would like to fit a Gaussian to each one and then find the standard deviation of the produced Gaussians, and then compare those std devs.

I'm unsure how to tell python to find the closest Gaussian to my curve, so any help is appreciated.

Here is my code so far, I tried to use curve_fit, but I really don't know how to, so I took it out, and I'm unsure how to define the Gaussian for the best fit.

import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg, special, optimize, stats
from numpy import math
from scipy.optimize import curve_fit

q = 4000
qA = np.arange(0.000001,q+1,1.0)
qB = q - qA
N = 2000
NA = 1000
NB = N - NA

ASA = (qA + NA -1)*np.log(qA + NA -1)-(qA*np.log(qA)) - (NA -1)*np.log(NA -1) #sterling approximate of SA/kb

ASB = (qB + NB -1)*np.log(qB + NB -1)-(qB*np.log(qB)) - (NB -1)*np.log(NB -1) #sterling approximate of SB/kb



TATA_list=[]
for i in range(1,len(qA)-1):
    
TATE = (qA[i+1] - qA[i-1]) / (ASA[i+1] - ASA[i-1])
TATA_list.append(TATE)                      

TBTA_list=[]
for i in range(1,len(qA)-1):
    
TBTE = (qB[i+1] - qB[i-1]) / (ASB[i+1] - ASB[i-1])
TBTA_list.append(TBTE)                      

AStot = (q + N -1)*np.log(q + N -1)-(q*np.log(q))-(N-1)*np.log(N-1)
Paprox = np.exp(ASA + ASB - AStot)
SUMPX = sum(Paprox[:-1])
NormPAprox = Paprox/SUMPX

plt.plot(qA/q, NormPAprox)



2

There are 2 best solutions below

2
Muhammed Yunus On BEST ANSWER

I start by defining the curve to fit, providing an initial guess of the parameters. Then I run the fit and print out the results.

enter image description here

#Define the curve
def gaussian(x, mean, sigma, amplitude):
    return amplitude * np.exp(-(x - mean)**2 / (2 * sigma**2))

#Drop NaNs
is_nan = np.isnan(NormPAprox)
is_valid = ~is_nan
print('Dropping', is_nan.sum(), 'NaNs')
x_vals = (qA / q)[is_valid]
y_vals = NormPAprox[is_valid]

#Initial guess for each parameter
# mean: x location of the peak
# std: std of x
# amplitude: amplitude of the peak
initial_guess = [x_vals[np.argmax(y_vals)], x_vals.std(), y_vals.max()]

#Get the fitted params
fit_params, fit_covariance = curve_fit(gaussian, x_vals, y_vals, p0=initial_guess)
print('Fitted mean, std, amplitude are:', fit_params.round(3))
print(' Variance of each estimate:', fit_covariance.diagonal().round(3))

Visualise the data and the fit (zoomed onto the peak):

#Plot original and estimated
plt.plot(x_vals, y_vals, linewidth=8, color='darkgreen', label='data')
plt.plot(x_vals, gaussian(x_vals, *fit_params), color='tab:orange', label='fitted')
plt.xlim(0.45, 0.55)
plt.title('Data and fit')
plt.legend()
plt.xlabel('qA/q')
plt.ylabel('NormPAprox')
plt.gcf().set_size_inches(5, 3)
5
Muhammed Yunus On

Here's an example of fitting a Gaussian to some data using sklearn, with some visualisation. The data is just a list of measurements. You can fit more Gaussians, and they can be multidimensional.

enter image description here

Create some mock data and plot it:

import numpy as np
import matplotlib.pyplot as plt

#Create some test data
np.random.seed(0)
measurements = np.random.randn(100)

#Plot the data
plt.scatter(range(0, len(measurements)), measurements, marker='s', c='tab:green', label='raw data')
plt.xlabel('sample number')
plt.ylabel('measurement')
plt.legend()

Fit model and report results:

#Fit Gaussian to the data
from sklearn.mixture import GaussianMixture

n_gaussians = 1
gmm = GaussianMixture(n_components=n_gaussians).fit(measurements.reshape(-1, 1))

#Print results
print(f'Fitted {n_gaussians} Gaussians to {len(measurements)} samples:')
for idx in range(n_gaussians):
    print(f' Gaussian {idx} mean:', gmm.means_.flatten().round(2))
    print(f' Gaussian {idx} var: ', gmm.covariances_.flatten().round(2))

#Overlay the fitted Gaussians
for idx in range(n_gaussians):
    mean = gmm.means_[idx]
    var = gmm.covariances_[idx]
    colour = plt.get_cmap('jet')((idx + 1)/ n_gaussians)
    plt.axhline(mean, c=colour, lw=2, ls='-', label=f'Gaussian {idx + 1} of {n_gaussians} (mean ± sd)')
    plt.vlines(idx, ymin=mean - var**0.5, ymax=mean + var**0.5, colors=colour)

plt.legend(bbox_to_anchor=(1.1, 1));