I have sampled some data from a network G with discrete values of node degrees in a network and calculated the distribution.
def degree_distribution(G):
vk = dict(G.degree())
vk = list(vk.values()) # we get only the degree values
maxk = np.max(vk)
mink = np.min(min)
kvalues= np.arange(0,maxk+1) # possible values of k
Pk = np.zeros(maxk+1) # P(k)
for k in vk:
Pk[k] = Pk[k] + 1
Pk = Pk/sum(Pk) # the sum of the elements of P(k) must to be equal to one
return kvalues,Pk
Calling it:
kvalues, Pk = degree_distribution(G)
dict_prob = dict(zip(kvalues,Pk))
I get:
{0: 0.0,
1: 0.0,
2: 0.0016146393972012918,
3: 0.004843918191603875,
4: 0.011840688912809472,
5: 0.03336921420882669,
6: 0.07319698600645856,
7: 0.10764262648008611,
8: 0.15177610333692143,
9: 0.16361679224973089,
10: 0.16254036598493002,
11: 0.11679224973089343,
12: 0.08880516684607104,
13: 0.052206673842841764,
14: 0.02099031216361679,
15: 0.006996770721205597,
16: 0.003767491926803014}
How do I test this sampled data for a binomial distribution, using scipy?
If you just want to know how how good a fit is a binomial PMF to your empirical distribution, you can simply do:
Which gives:
Edit: To test the hypothesis that the empirical frequencies follow the expected ones from a binomial distribution, you could use stats.chisquare:
Note that the null hypothesis here is that the frequencies are the same, so a p-value > 0.05 would be evidence for the null.