Information gain works on the concept of entropy.

The entropy of a dataset before and after a transformation is used to calculate information gain.

Entropy is the measure of uncertainty in the data. The effort is to reduce the entropy and maximize the information gain. The feature having the most information is considered important by the algorithm and is used for training the model.

Information gain is a way to find the relevant and best features from the data set. This ensures that model gives good prediction.

Decision Tree and Random forest also can use Information gain (or Gini Coefficient) to find the best split for the tree.

If Entropy is zero this means that no information can be gained from the feature. The higher the entropy the harder it is to draw conclusions from the data.