I am using the below code to compute the JS divergence between the two pdfs but I am getting some unexpected answers: I took three datasets A,B and C in this A and B are similar datasets and C is completely different from the A and B then I fit the gmm and computing the JS divergence between them but this cose is giving answer like:
Note: The PDFs are multivariate
A-A: 0.000000245 (which is expected)
A-B: 0.00243 (which is also expected since A and B are similar)
A-C: 0.003114 ( This is where I am confused I mean A and C are different then why this is very small)
from scipy.stats import entropy
import numpy as np
def js_divergence(pdf1, pdf2, epsilon=1e-10):
# Ensure the PDFs are normalized
pdf1_normalized = pdf1 / np.sum(pdf1)
pdf2_normalized = pdf2 / np.sum(pdf2)
# Add a small epsilon to avoid log(0)
pdf1_safe = pdf1_normalized + epsilon
pdf2_safe = pdf2_normalized + epsilon
# Calculate the midpoint between the two epsilon-adjusted, normalized PDFs
m = 0.5 * (pdf1_safe + pdf2_safe)
# Calculate the Kullback-Leibler divergences and then the Jensen-Shannon divergence
kl_divergence1 = entropy(pdf1_safe.ravel(), m.ravel(), base=2)
kl_divergence2 = entropy(pdf2_safe.ravel(), m.ravel(), base=2)
js_div = 0.5 * (kl_divergence1 + kl_divergence2)
return js_div
# Calculate the JS divergence
js_divergence_cloud_1_2 = js_divergence(pdf_cloud_1, pdf_cloud_2)
print(f"JS divergence between cloud_1 and cloud_2: {js_divergence_cloud_1_2}")
I do not know what mistake I made please if someone can explain it!