 # Confusion Matrix in Machine Learning : Your One Stop Solution

Published on Jun 10,2019 2.1K Views Kurt is a Big Data and Data Science Expert, working as a...

In Supervised Machine Learning, we usually have 2 different types of use cases, Regression, and Classification problem. Confusion Matrix helps in calculating the accuracy of the classification model which indirectly helps us to describe the performance of the classification model. It is the most important step when it comes to evaluating a model. I’ll be covering the following topics in this article:

1. What is a Confusion Matrix?

2. Accuracy and Components of Confusion Matrix

4. Creating a Confusion Matrix by using Python and Sklearn

## What is a Confusion Matrix?

A Confusion matrix is the comparison summary of the predicted results and the actual results in any classification problem use case. The comparison summary is extremely necessary to determine the performance of the model after it is trained with some training data. For a binary classification use case, a Confusion Matrix is a 2×2 matrix which is as shown below

 Predicted Class 1 Value  EG: 1 Predicted Class 2 Value EG:0 Actual Class 1 Value  EG:   1 TP (True Positive) FN (False Negative) Actual Class 2 Value  EG: 0 FP (False Positive) TN (True Negative)

From the above figure:
We have,

• Actual Class 1 value= 1 which is similar to Positive value in a binary outcome.
• Actual Class 2 value = 0 which is similar to a negative value in binary outcome.

The left side index of the confusion matrix basically indicates the Actual Values and the top column indicates the Predicted Values.

There are various components that exist when we create a confusion matrix. The components are mentioned below

Positive(P): The predicted result is Positive (Example: Image is a cat)

Negative(N): the predicted result is Negative (Example: Images is not a cat)

True Positive(TP): Here TP basically indicates the predicted and the actual values is 1(True)

True Negative(TN): Here TN indicates the predicted and the actual value is 0(False)

False Negative(FN): Here FN indicates the predicted value is 0(Negative) and Actual value is 1. Here both values do not match. Hence it is False Negative.

False Positive(FP): Here FP indicates the predicted value is 1(Positive) and the actual value is 0. Here again both values mismatches. Hence it is False Positive.

## Accuracy and Components of Confusion Matrix

After the confusion matrix is created and we determine all the components values, it becomes quite easy for us to calculate the accuracy. So, let us have a look at the components to understand this better.
• Classification Accuracy From the above formula, the sum of TP (True Positive) and the TN (True Negative) are the correct predicted results. Hence in order to calculate the accuracy in percentage, we divide with all the other components. However, there are some problems in the accuracy and we cannot completely depend on it.

Let us consider that our dataset is completely imbalanced. In this Scenario, 98% accuracy can be good or bad based on the problem statement. Hence we have some more key terms which will help us to be sure about the accuracy we calculate. The terms are as given below:

• TPR (True Positive Rate) or Sensitivity:

True Positive rate which is also known as Sensitivity measures the percentage of the True Positive with respect to the Total Actual Positives which is indicated by (TP+ FN)

 Predicted Class 1 Value EG: 1 Predicted Class 2 Value EG:0 Total Actual Class 1 ValueEG:   1 TP (True Positive) FN (False Negative) Total Actual Positives Actual Class 2 ValueEG: 0 FP (False Positive) TN (True Negative) Total Actual Negatives
TPR= True Positive/ (True Positive + False Negative

• TNR (True Negative Rate) or Specificity:

True Negative Rate or Specificity measures the proportion of actual negatives with respect to the Total Negatives

 Predicted Class 1 Value EG: 1 Predicted Class 2 Value EG:0 Total Actual Class 1 ValueEG:   1 TP (True Positive) FN (False Negative) Total Actual Positives Actual Class 2 ValueEG: 0 FP (False Positive) TN (True Negative) Total Actual Negatives

TNR= True Negative/ (True Negative+ False Positive)

• False Positive Rate(FPR):

False Positive Rate is the percentage of Predicted False Positive (FP) to the Total No of Predicted Positive Results (TP + FP).

 Predicted Class 1 Value EG: 1 Predicted Class 2 Value EG:0 Actual Class 1 ValueEG:   1 TP (True Positive) FN (False Negative) Actual Class 2 ValueEG: 0 FP (False Positive) TN (True Negative) Sum of Total Predicted Positive Sum of Total Predicted Negative
FPR= False Positive/ (True Positive + False Positive)

• False Negative Rate (FNR):

False Negative Rate is the percentage of Predicted False Negative (FP) to the Total No of Predicted Negative Results (TN + FN).

 Predicted Class 1 Value EG: 1 Predicted Class 2 Value EG:0 Actual Class 1 ValueEG:   1 TP (True Positive) FN (False Negative) Actual Class 2 ValueEG: 0 FP (False Positive) TN (True Negative) Sum of Total Predicted Positive Sum of Total Predicted Negative
FNR= False Negative/ (False Negative + True Negative)

## Precision, Recall, and F-Measure

• Recall:

A recall is similar to the True Positive Rate and it is the ratio of the Total number of correctly predicted positive values(TP) to all the Positive Values. • Precision:

The Precision basically indicates all the points the model predicted to be positive and what percentage of them are actually Positive. Precision and Recall are metrics results which focus on the positive class as shown from the above formulas.

• F-Measure

So F-Measure is a technique which combines both the Precision and Recall technique and it uses Harmonic Mean in the place of the usual Arithmetic Mean, due to which the extreme values are punished. The F-measure is also called as F1- score and is given by the below formula. Let us consider an example and see how we can compute the Accuracy, Precision, Recall and the F1-score.

 N = 165 Predicted YES Predicted NO Actual YES TP = 150 FN = 10 Actual NO FP = 20 TN = 100

• Accuracy = (TP + TN) / (TP + TN + FP + FN) = (150 + 100) /(150+100+20+10)= 0.89
• Recall= TP/ (TP+FN) = 150/(150+10) = 0.93
• Precision: TP/(TP+FP)= 150/(150+20) = 0.88

• F-measure = (2*Recall*Precision)/(Recall+Presision) = (2*0.93*0.88)/(0.93+0.88) = 0.90

## Creating a Confusion Matrix by using Python and Sklearn

Now we will see an example of how we can create a confusion matrix using python along with the sklearn library.

1. Initially, we will create some list of the actual data and the predicted to check the accuracy as shown below

```
# Python script for confusion matrix creation.

actual_data = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0,1,0,1]
predicted_data = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0,1,0,1]

```

2. We need to import the confusion matrix from the sklearn library as shown below:

```
from sklearn.metrics import confusion_matrix

```

3. Next, we will create the confusion matrix as shown below:

```
final_results = confusion_matrix(actual_data, predicted_data)

```

4. Now we can go ahead and calculate the accuracy by importing the library as shown below:

```
from sklearn.metrics import accuracy_score
accuracy=accuracy_score(actual_data,predicted_data)

```

5. Finally, we compute the F1-score or F- Measure as shown below:

```
from sklearn.metrics import classification_report
report=classification_report(actual_data,predicted_data)

```

Below is the Complete Code:

```
actual_data = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0,1,0,1]
predicted_data = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0,1,0,1]

from sklearn.metrics import confusion_matrix

final_results = confusion_matrix(actual_data, predicted_data)

print(final_results)

from sklearn.metrics import accuracy_score
accuracy=accuracy_score(actual_data,predicted_data)

from sklearn.metrics import classification_report
report=classification_report(actual_data,predicted_data)

print(accuracy)

print(report)

```   So, with this, we come to an end of this article. I hope all your Confusion about the Confusion Matrix is now resolved.

Edureka’s Machine Learning Certification Training using Python helps you gain expertise in various machine learning algorithms such as regression, clustering, decision trees, random forest, Naïve Bayes and Q-Learning. This Machine Learning using Python Training exposes you to concepts of Statistics, Time Series and different classes of machine learning algorithms like supervised, unsupervised and reinforcement algorithms. Throughout the Data Science Certification Course, you’ll be solving real-life case studies on Media, Healthcare, Social Media, Aviation, HR. REGISTER FOR FREE WEBINAR Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month