I'm trying to fit an asymmetric gaussian to my data. My data is just a numpy array called wave (x) and a numpy array called spec (y) that looks like an asymmetric gaussian.
This is the function:
def agauss(amp, cen, b_sigma, r_sigma, x):
y = np.zeros(len(x))
for i in range(len(x)):
if x[i] < cen:
y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*b_sigma**2))
else:
y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*r_sigma**2))
return y
I'm using this code to fit the parameters:
with pm.Model() as asym:
cen = pm.Uniform('cen', lower=5173, upper=5179)
bsigma = pm.HalfCauchy('bsigma', beta=3)
rsigma = pm.HalfCauchy('rsigma', beta=3)
amp = pm.Uniform('amp', lower=1e-19, upper=1e-16)
err = pm.HalfCauchy('err', beta=0.0000001)
ag_pred = pm.Normal('ag_pred', mu=agauss(amp, cen, bsigma, rsigma, wave), sigma=err, observed=spec)
agdata = pm.sample(3000, cores=2)
But I get the error "Variables do not support boolean operations" in the theano.tensor module. How should I define the function in order to fit the paremeters? There is a better way to do this? Thanks!!
144 err = pm.HalfCauchy('err', beta=0.0000001)
145
--> 146 ag_pred = pm.Normal('ag_pred', mu=agauss(amp, cen, bsigma, rsigma, wave), sigma=err, observed=spec)
147
148 agdata = pm.sample(3000, cores=2)
~/Documents/OIII_emitters/m2fs_reduction/test/assets/scripts/analysis.py in agauss(amp, cen, b_sigma, r_sigma, x)
118 y = np.zeros(len(x))
119 for i in range(len(x)):
--> 120 if x[i] < cen:
121 y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*b_sigma**2))
122 else:
~/anaconda3/envs/data_science/lib/python3.7/site-packages/theano/tensor/var.py in __bool__(self)
92 else:
93 raise TypeError(
---> 94 "Variables do not support boolean operations."
95 )
96
TypeError: Variables do not support boolean operations.
Here is an approach that follows the
switchpointtrick of pymc3. It's basically an non-linear extension of this SO question. Below I show the code and how it works with some mock data that I created. I don't expect the mock data to be too similar with your actual data but it should give you a starting point. This approach does have some modelling limitations (i.e. fitting two gaussians) but again they could go away with more realistic input data.Firstly I create some mock input data. Note that two gaussians attached to each other will have different normalisations and will also approach the continuum of your spectrum somewhat differently. I didn't attempt to include the latter subtlety in the mock data but I kept the normalisation consistent. Another complication is the fact that the amplitude of the two gaussians is different. This probably won't be a problem with real data but here I simply adda constant to match them. From these mock data we would like to recover parameters
cen_mock,sigma_mock_1andsigma_mock_2. Finally note that I kept the same frequency limits as in your question.Once we have some mock data we build the model in pymc3 and sample:
The parameters of the model is the
swithcpointwhich represents where the gaussians change and the standard deviations. So the sampler will be estimating the "right" or "left" standard deviation based on the value ofx.We now check the results:
There are of course much better ways of checking the sampling results (which are actually bayesian) but this is a quick and dirty way to get a feel of what is going on. Note that after the modelling I still need to adda constant to one of the gaussians to join them naturally. For three different pairs of standard deviations I get the following plots:
Now for some discussion. To evaluate how good this approach is we need to have some better idea of the real data. As you can see if the standard deviations differ significantly then the model starts failing. However for somewhat similar standard deviation and the edge case of the standard deviation being equal the model is somewhat acceptable. Furthermore another important input is whether the number of frequency datapoints is equal for the two gaussians. This will determine how each gaussian approaches the continuum. It will also determine whether there any need to arbitrarily adda constant to the second gaussian in order to join them in a more natural way when fitting on real data.
To sum up, this is an approach that follows closely your desired parametrisation, but with some extra work in evaluation using mock data. From your question it seems to me that the
switchpointapproach is required, but only when applied to real data (or more realistic mock data) one can know with some certainty if it's sufficient.