Does this middle variable have any information gain?

76 Views Asked by Chas At 23 August 2025 at 10:56

Let's say that I have 2 variables: A (as the input) and C (as the output)
So it's A -> C
There's also another variable B, and
corr(A, B) > corr(A, C)
corr(C, B) > corr(A, C)

Would A -> B -> C get better performance with the existing model?
In other words, does this B have any information gain?

Original Q&A

There are 1 best solutions below

Azmisov On 27 July 2021 at 06:51

The information gained about C, given A is: log(1/P(A,C)). The information gained about C, given both A and B is: log(1/P(A,B,C)). So as long as P(A,C) > P(A,B,C), there will be more information gained by including B.

Now, whether or not that's the case depends on what you're using for the corr metric. But if A/C are dependent on B, there will be at least some values of B which are giving information gain. In general, I'd always include dependent variables in a model, unless the dependence is too strong, in which case some models may not work as well (e.g. neural networks).

Does this middle variable have any information gain?

There are 1 best solutions below

Related Questions in ENTROPY

Related Questions in INFORMATION-GAIN

Trending Questions

Popular # Hahtags

Popular Questions