In docs.scipy.org there's a code to generate Pareto distribution. I could understand most of the code snippet except the usage of term 'fit' for PDF(probability Density Function) and the formula: max(count)*fit/max(fit)
Here's the code snippet:
import matplotlib.pyplot as plt
a, m = 3., 2. # shape and mode
s = (np.random.pareto(a, 1000) + 1) * m
count, bins, _ = plt.hist(s, 100, normed=True)
fit = a*m**a / bins**(a+1)
plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')
plt.show()
I thoroughly searched the web for the formula: max(count)*fit/max(fit) Even replaced the term 'fit' with pdf. But could not get any leads. Kindly explain the concept of what the formula is conveying.
I assumed the term 'fit' is used instead of PDF as they are using the formula of PDF for Pareto distribution for fit.
Finally, what does the underscore '_' in the code convey:
count, bins, _ = plt.hist(s, 100, normed=True)
np.random.paretodraws random samples from the Pareto-II distribution. The resulting data is therefore realisations from this distribution, rather than the probability density of the distribution.In the call to
plt.histwe use thenormed=Trueargument. This normalises the data and plots the density of our samples on the y-axis, rather than the frequency.We then wish to fit a pareto distribution to our randomly sampled data and plot this distribution on top of our data.
To do so we begin by computing the probability density of the pareto distribution at the x-values defined by
binswith parametersaandm. This is our definition of fit:fit = a*m**a / bins**(a+1).The necessity of the
max(count) * fit / max(fit)term is a little more elusive. I think it's clear why we'd includefitin the plotting command, but why the ratiomax(count) / max(fit)? Actually, I'm not 100% sure.max(count) / max(fit)looks like it could be a bias correction from fitting the pareto distribution to our data.