fit bimodal skewed gaussian

58 Views Asked by At

I have some data that I am trying to fit with a bimodal skewed gaussian. I started with a standadard bimodal gaussian and the data is just too skew. I also want to be able to extract the underlying gaussian.

For the bimodal, I have (it is a bit more than needed due to the loop and the plotting):

def gauss(x,mu,sigma,A):
    return A*np.exp(-(x-mu)**2/2/sigma**2)

def bimodal(x,mu1,sigma1,A1,mu2,sigma2,A2):
    return gauss(x,mu1,sigma1,A1)+gauss(x,mu2,sigma2,A2)

params=np.empty(6,dtype=object)

for i in range(len(files)):
    x=data_x[i]
    y=data_y[i]
    
    max_count=np.max(data_y[i][2:])
     
    peaks, _ = find_peaks(data_y[i], height=max_count-5, distance=4)
    
    
    p_init=[data_x[i][peaks[0]],1,max_count,data_x[i][peaks[1]],1,max_count,]
    params,cov=curve_fit(bimodal,x[0:len(y)],y,p_init)
    
    params=np.vstack([params,params])
    x=x.astype(int)
    plt.bar(x[0:len(y)],y)
    
    plt.plot(x,bimodal(x,*params),'r')
    plt.title(naming_string[i]+' -> index = '+ str(i))
    plt.show()

The data I have looks like this:

enter image description here

It really needs a skew-ed fit with the left curve having a positive skew and the right curve having a negative skew.

What I would like to do is just modify the input function to do this. However all of the skewed normal distributions use scipy to bypass the input function. This doesn't allow me to modidy it for a bimodal (that I can tell).

The function I have tried is a scipy function:

def bimodal_skew_normal(x, amp1, mean1, std1, skew1, amp2, mean2, std2, skew2):
    pdf1 = amp1 * skewnorm.pdf(x, skew1, loc=mean1, scale=std1)
    pdf2 = amp2 * skewnorm.pdf(x, skew2, loc=mean2, scale=std2)
    return pdf1 + pdf2

which will likely work after some tweaking of the initial parameters, however I am not sure how to get the underlying gaussian from these. the amplitude and mean that pop out of the fit do not correspond to an unskewed gaussian.

1

There are 1 best solutions below

0
jlandercy On

Without any dataset we must first create a synthetic one. Assuming your data does follow a double skewed normal, lets do so.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, optimize

def peak(x, A, a, loc, scale):
    return A * stats.skewnorm.pdf(x, a=a, loc=loc, scale=scale)

def model(x, A1, a1, loc1, scale1, A2, a2, loc2, scale2):
    return peak(x, A1, a1, loc1, scale1) + peak(x, A2, a2, loc2, scale2)

p0 = (500, 1.2, 140, 10, 600, 1.5, 170, 15)

xexp = np.linspace(120, 220, 70)
yexp = model(xexp, *p0)

Now we can fit this distribution with the model:

popt, pcov = optimize.curve_fit(
    model, xexp, yexp,
    p0=(1000, 1, 150, 1, 1000, 1, 180, 1),
    bounds = [
        (1, 0.1, 140, 0.1, 1, 0.1, 170, 0.1),
        (1e4, 2.0, 160, 100, 1e4, 2.0, 190, 100)
    ]
)
# array([500. ,   1.2, 140. ,  10. , 600. ,   1.5, 170. ,  15. ])

Where initial guess and bounds are of importance.

Final result, looks like:

xlin = np.linspace(xexp.min(), xexp.max(), 250)
yhat = model(xlin, *popt)

fig, axe = plt.subplots()
axe.step(xexp, yexp, where="mid")
axe.plot(xlin, yhat)
axe.grid()

enter image description here