Fitting a power law

127 Views Asked by At

I'm trying to recreate an analysis from the article: "From time series to complex networks: The visibility graph". Here is the link to the article: https://www.pnas.org/doi/10.1073/pnas.0709247105#core-B14. The series i'm working on is the Brownian motion B(t) and this is the code i used to generate the data:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def brownian_motion(t, a=1):
    dt = t[1] - t[0]
    dW = np.random.normal(scale=np.sqrt(dt), size=t.shape)
    W = np.cumsum(dW)
    return a**(1/2) * W

t = np.linspace(0, 100, 100000)

np.random.seed(42)

W = brownian_motion(t)

plt.plot(t, W)
plt.xlabel('t')
plt.ylabel('W(t)')
plt.title('Brownian Motion')
plt.show()

df = pd.DataFrame(W, index=t, columns=['W(t)'])
df.index.name = 't'
df.to_csv('brownian_motion_data.csv')

The below is the result i hope to achieve: enter image description here

import numpy as np
import matplotlib.pyplot as plt
from ts2vg import NaturalVG
import networkx as nx
import pandas as pd
import powerlaw

data = pd.read_csv('brownian_motion_data.csv')
dat = data['W(t)'][:500]
nodes = range(len(dat))
vg = NaturalVG()
vg.build(dat)
edges = vg.edges
G = nx.Graph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)

n = nx.number_of_nodes(G)
degree_freq = nx.degree_histogram(G)
degree_dist = []
for i in degree_freq:
   x = i / n
   degree_dist.append(x)
degrees = range(len(degree_freq))
fit = powerlaw.Fit(np.array(degree_dist) + 10 ** (-6))
alpha = round(fit.power_law.alpha, 3)

plt.figure(figsize=(8, 6))
plt.scatter(degrees, degree_dist, marker='o', color='b', label='Data')
fit.power_law.plot_pdf(color='r', linestyle='--', label=f'Fit (alpha={alpha})')
plt.xlabel('Degree')
plt.ylabel('Frequency')
plt.legend()
plt.show()

degrees_array = np.array(degrees)
degree_dist_array = np.array(degree_dist)
coefficients = np.polyfit(degrees_array, degree_dist_array, 2)
poly_function = np.poly1d(coefficients)
degrees_fit = np.linspace(min(degrees_array), max(degrees_array), 100)
degree_dist_fit = poly_function(degrees_fit)

plt.loglog(degrees_array, degree_dist_array, label='Original Data')
plt.loglog(degrees_fit, degree_dist_fit, label=f'Polynomial Fit', color='red')
plt.xlabel('Degrees')
plt.ylabel('Degree Distribution')
plt.title('Polynomial Fitting Example for Degree Distribution')
plt.legend()
plt.show()

enter image description hereenter image description here

My two approaches is using the powerlaw and polyfit function. Since there are 0 values for the probability of degree, combined with my adjust, the result i get is pretty much disappointing. I'm looking forward to some suggestion and even other alternative ways to get the powerlaw fit similar in the above article.

1

There are 1 best solutions below

0
Oskar Hofmann On

I am not very familiar with the powerlaw package but after skimming the corresponding paper, I assume that what is missing in your code is identifying the correct data range for the fit of the power law (see section "Identifying the Scaling Range"). Your data clearly only partially follows a power law.

You are also preprocessing your data, calculating the probability density function manually. If I understand the powerlaw package and the examples correctly, this is done by the package itself and your input data should be the raw data. Otherwise one could just fit a custom powerlaw-function via e.g. scipy.optimize.curve_fit (see below). This preprocessing is what the powerlaw package is for.

The np.polyfit function works as intended but your data simply does not follow a 2nd degree polynomial.

Edit: Here a working example using scipy.optimize.curve_fit:

def powlaw(x,a,b,c):
    return np.power((x-b),a) + c

# omit first values not following a powerlaw
degrees_array = np.array(degrees)[5:]
degree_dist_array = np.array(degree_dist)[5:]
# initial guess for a,b,c
p0 = [-2,0,0]
popt, pcov = curve_fit(powlaw, degrees_array, degree_dist_array, p0)

plt.loglog(degrees_array, degree_dist_array, label='Original Data')
plt.loglog(degrees_array, powlaw(degrees_array, *popt), label=f'Powerlaw Fit', color='red')
plt.xlabel('Degrees')
plt.ylabel('Degree Distribution')
plt.legend()
plt.show()

print(f'{popt=}')

Fit without powerlaw package

popt=array([-1.45852810e+00, -6.60754238e-01, -2.66451297e-04])

It is not a perfect fit but this just means that the data does not fully follow a power law. Probably a good first step would be to further omit data points from the beginning of your data array. Which is again just what the "Identifiyng the Scaling Rage" section of the powerlaw package paper is about.