What does max(count)*fit/max(fit) suppose to mean? What is the term 'fit' supposed to convey?

117 Views Asked by At

In docs.scipy.org there's a code to generate Pareto distribution. I could understand most of the code snippet except the usage of term 'fit' for PDF(probability Density Function) and the formula: max(count)*fit/max(fit)

Here's the code snippet:

import matplotlib.pyplot as plt
a, m = 3., 2.  # shape and mode
s = (np.random.pareto(a, 1000) + 1) * m
count, bins, _ = plt.hist(s, 100, normed=True)
fit = a*m**a / bins**(a+1)
plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')
plt.show()

I thoroughly searched the web for the formula: max(count)*fit/max(fit) Even replaced the term 'fit' with pdf. But could not get any leads. Kindly explain the concept of what the formula is conveying.

I assumed the term 'fit' is used instead of PDF as they are using the formula of PDF for Pareto distribution for fit.

Finally, what does the underscore '_' in the code convey:

count, bins, _ = plt.hist(s, 100, normed=True)
1

There are 1 best solutions below

5
jwalton On

np.random.pareto draws random samples from the Pareto-II distribution. The resulting data is therefore realisations from this distribution, rather than the probability density of the distribution.

In the call to plt.hist we use the normed=True argument. This normalises the data and plots the density of our samples on the y-axis, rather than the frequency.

We then wish to fit a pareto distribution to our randomly sampled data and plot this distribution on top of our data.

To do so we begin by computing the probability density of the pareto distribution at the x-values defined by bins with parameters a and m. This is our definition of fit: fit = a*m**a / bins**(a+1).

The necessity of the max(count) * fit / max(fit) term is a little more elusive. I think it's clear why we'd include fit in the plotting command, but why the ratio max(count) / max(fit)? Actually, I'm not 100% sure.

max(count) / max(fit) looks like it could be a bias correction from fitting the pareto distribution to our data.