JS divergence code is giving wrong answers

24 Views Asked by At

I am using the below code to compute the JS divergence between the two pdfs but I am getting some unexpected answers: I took three datasets A,B and C in this A and B are similar datasets and C is completely different from the A and B then I fit the gmm and computing the JS divergence between them but this cose is giving answer like:

Note: The PDFs are multivariate

A-A: 0.000000245 (which is expected)

A-B: 0.00243 (which is also expected since A and B are similar)

A-C: 0.003114 ( This is where I am confused I mean A and C are different then why this is very small)

from scipy.stats import entropy
import numpy as np

def js_divergence(pdf1, pdf2, epsilon=1e-10):
    # Ensure the PDFs are normalized
    pdf1_normalized = pdf1 / np.sum(pdf1)
    pdf2_normalized = pdf2 / np.sum(pdf2)

    # Add a small epsilon to avoid log(0)
    pdf1_safe = pdf1_normalized + epsilon
    pdf2_safe = pdf2_normalized + epsilon

    # Calculate the midpoint between the two epsilon-adjusted, normalized PDFs
    m = 0.5 * (pdf1_safe + pdf2_safe)

    # Calculate the Kullback-Leibler divergences and then the Jensen-Shannon divergence
    kl_divergence1 = entropy(pdf1_safe.ravel(), m.ravel(), base=2)
    kl_divergence2 = entropy(pdf2_safe.ravel(), m.ravel(), base=2)
    js_div = 0.5 * (kl_divergence1 + kl_divergence2)

    return js_div

# Calculate the JS divergence
js_divergence_cloud_1_2 = js_divergence(pdf_cloud_1, pdf_cloud_2)
print(f"JS divergence between cloud_1 and cloud_2: {js_divergence_cloud_1_2}")

I do not know what mistake I made please if someone can explain it!

0

There are 0 best solutions below