You can use information gain to decide which attribute goes at which level in the decision tree. By using information gain as a criterion, we try to estimate the information contained by each attribute. We are going to use some points deducted from information theory.
To measure the randomness or uncertainty of a random variable X is defined by Entropy.
For a binary classification problem with only two classes, positive and negative class.
- If all examples are positive or all are negative then entropy will be zero i.e, low.
- If half of the records are of positive class and half are of negative class then entropy is one i.e, high.
By calculating the entropy measure of each attribute we can calculate their information gain. Information Gain calculates the expected reduction in entropy due to sorting on the attribute. Information gain can be calculated.