Calculating the entropy of a specific attribute?

Question

Calculating the entropy of a specific attribute?

4.1k Views Asked by ribs2spare At 17 November 2025 at 12:26

This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to

this point on the page.

I don't understand how the entropy for each individual attribute (sunny, windy, rainy) is calculated--specifically, how p-sub-i is calculated. It seems different than the way it is calculated for Entropy(S). Can anyone explain the process behind this calculation?

Original Q&A

There are 2 best solutions below

EMILIO CARLOS RODRIGUES On 17 August 2018 at 18:06

Calc proportion that sunny represents on set S, i.e., |sunnyInstances| / |S| = 3/10 = 0.3.

Apply the entropy formula considering only sunny entropy. Theres 3 sunny instances divided into 2 classes being 2 sunny related with Tennis and 1 related to Cinema. So the entropy formula for sunny gets something like this: -2/3 log2(2/3) - 1/3 log2(1/3) = 0.918

And so on.

**Elliott Addi** · Accepted Answer

To split a node into two different child nodes, one method consists splitting the node according to the variable that can maximise your information gain. When you reach a pure leaf node, the information gain equals 0 (because you can't gain any information by splitting a node containing only one variable - logic).

In your example Entropy(S) = 1.571 is your current entropy - the one you have before splitting. Let's call it HBase. Then you compute the entropy depending on several splittable parameters. To get your Information Gain, you substract the entropy of your child nodes to HBase -> gain = Hbase - child1NumRows/numOfRows*entropyChild1 - child2NumRows/numOfRows*entropyChild2

def GetEntropy(dataSet):
    results = ResultsCounts(dataSet)
    h = 0.0   #h => entropy

    for i in results.keys():
        p = float(results[i]) / NbRows(dataSet)
        h = h - p * math.log2(p)
    return h

def GetInformationGain(dataSet, currentH, child1, child2):
    p = float(NbRows(child1))/NbRows(dataSet)
    gain = currentH - p*GetEntropy(child1) - (1 - p)*GetEntropy(child2)
    return gain

The objective is to get the best of all Information Gains!

Hope this helps! :)

Calculating the entropy of a specific attribute?

There are 2 best solutions below

Related Questions in DECISION-TREE

Related Questions in ID3

Related Questions in ENTROPY

Related Questions in INFORMATION-GAIN

Trending Questions

Popular # Hahtags

Popular Questions