This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to
I don't understand how the entropy for each individual attribute (sunny, windy, rainy) is calculated--specifically, how p-sub-i is calculated. It seems different than the way it is calculated for Entropy(S). Can anyone explain the process behind this calculation?
To split a node into two different child nodes, one method consists splitting the node according to the variable that can maximise your information gain. When you reach a pure leaf node, the information gain equals 0 (because you can't gain any information by splitting a node containing only one variable -
logic).In your example
Entropy(S) = 1.571is your current entropy - the one you have before splitting. Let's call itHBase. Then you compute the entropy depending on several splittable parameters. To get your Information Gain, you substract the entropy of your child nodes toHBase->gain = Hbase - child1NumRows/numOfRows*entropyChild1 - child2NumRows/numOfRows*entropyChild2The objective is to get the best of all Information Gains!
Hope this helps! :)