Fitting a Gaussian to a probability distribution to find the standard deviation, in python using matplotlib

Question

Fitting a Gaussian to a probability distribution to find the standard deviation, in python using matplotlib

221 Views Asked by AXF567 At 06 December 2023 at 12:33

I'm quite new to coding so please be patient. I'm trying to model different probability distributions, and I would like to fit a Gaussian to each one and then find the standard deviation of the produced Gaussians, and then compare those std devs.

I'm unsure how to tell python to find the closest Gaussian to my curve, so any help is appreciated.

Here is my code so far, I tried to use curve_fit, but I really don't know how to, so I took it out, and I'm unsure how to define the Gaussian for the best fit.

import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg, special, optimize, stats
from numpy import math
from scipy.optimize import curve_fit

q = 4000
qA = np.arange(0.000001,q+1,1.0)
qB = q - qA
N = 2000
NA = 1000
NB = N - NA

ASA = (qA + NA -1)*np.log(qA + NA -1)-(qA*np.log(qA)) - (NA -1)*np.log(NA -1) #sterling approximate of SA/kb

ASB = (qB + NB -1)*np.log(qB + NB -1)-(qB*np.log(qB)) - (NB -1)*np.log(NB -1) #sterling approximate of SB/kb



TATA_list=[]
for i in range(1,len(qA)-1):
    
TATE = (qA[i+1] - qA[i-1]) / (ASA[i+1] - ASA[i-1])
TATA_list.append(TATE)                      

TBTA_list=[]
for i in range(1,len(qA)-1):
    
TBTE = (qB[i+1] - qB[i-1]) / (ASB[i+1] - ASB[i-1])
TBTA_list.append(TBTE)                      

AStot = (q + N -1)*np.log(q + N -1)-(q*np.log(q))-(N-1)*np.log(N-1)
Paprox = np.exp(ASA + ASB - AStot)
SUMPX = sum(Paprox[:-1])
NormPAprox = Paprox/SUMPX

plt.plot(qA/q, NormPAprox)

Original Q&A

There are 2 best solutions below

Muhammed Yunus On 06 December 2023 at 18:57

Here's an example of fitting a Gaussian to some data using sklearn, with some visualisation. The data is just a list of measurements. You can fit more Gaussians, and they can be multidimensional.

Create some mock data and plot it:

import numpy as np
import matplotlib.pyplot as plt

#Create some test data
np.random.seed(0)
measurements = np.random.randn(100)

#Plot the data
plt.scatter(range(0, len(measurements)), measurements, marker='s', c='tab:green', label='raw data')
plt.xlabel('sample number')
plt.ylabel('measurement')
plt.legend()

Fit model and report results:

#Fit Gaussian to the data
from sklearn.mixture import GaussianMixture

n_gaussians = 1
gmm = GaussianMixture(n_components=n_gaussians).fit(measurements.reshape(-1, 1))

#Print results
print(f'Fitted {n_gaussians} Gaussians to {len(measurements)} samples:')
for idx in range(n_gaussians):
    print(f' Gaussian {idx} mean:', gmm.means_.flatten().round(2))
    print(f' Gaussian {idx} var: ', gmm.covariances_.flatten().round(2))

#Overlay the fitted Gaussians
for idx in range(n_gaussians):
    mean = gmm.means_[idx]
    var = gmm.covariances_[idx]
    colour = plt.get_cmap('jet')((idx + 1)/ n_gaussians)
    plt.axhline(mean, c=colour, lw=2, ls='-', label=f'Gaussian {idx + 1} of {n_gaussians} (mean ± sd)')
    plt.vlines(idx, ymin=mean - var**0.5, ymax=mean + var**0.5, colors=colour)

plt.legend(bbox_to_anchor=(1.1, 1));

**Muhammed Yunus** · Accepted Answer · 2023-12-07T09:20:46.713000

I start by defining the curve to fit, providing an initial guess of the parameters. Then I run the fit and print out the results.

#Define the curve
def gaussian(x, mean, sigma, amplitude):
    return amplitude * np.exp(-(x - mean)**2 / (2 * sigma**2))

#Drop NaNs
is_nan = np.isnan(NormPAprox)
is_valid = ~is_nan
print('Dropping', is_nan.sum(), 'NaNs')
x_vals = (qA / q)[is_valid]
y_vals = NormPAprox[is_valid]

#Initial guess for each parameter
# mean: x location of the peak
# std: std of x
# amplitude: amplitude of the peak
initial_guess = [x_vals[np.argmax(y_vals)], x_vals.std(), y_vals.max()]

#Get the fitted params
fit_params, fit_covariance = curve_fit(gaussian, x_vals, y_vals, p0=initial_guess)
print('Fitted mean, std, amplitude are:', fit_params.round(3))
print(' Variance of each estimate:', fit_covariance.diagonal().round(3))

Visualise the data and the fit (zoomed onto the peak):

#Plot original and estimated
plt.plot(x_vals, y_vals, linewidth=8, color='darkgreen', label='data')
plt.plot(x_vals, gaussian(x_vals, *fit_params), color='tab:orange', label='fitted')
plt.xlim(0.45, 0.55)
plt.title('Data and fit')
plt.legend()
plt.xlabel('qA/q')
plt.ylabel('NormPAprox')
plt.gcf().set_size_inches(5, 3)

Fitting a Gaussian to a probability distribution to find the standard deviation, in python using matplotlib

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in MATPLOTLIB

Related Questions in CURVE-FITTING

Related Questions in GAUSSIAN

Trending Questions

Popular # Hahtags

Popular Questions