Confusion Matrix in Machine Learning: Your One Stop Solution

Data Science with Python (15 Blogs)

In Supervised Machine Learning, we usually have 2 different types of use cases, Regression, and Classification problem. Confusion Matrix helps in calculating the accuracy of the classification model which indirectly helps us to describe the performance of the classification model. It is the most important step when it comes to evaluating a model. I’ll be covering the following topics in this article:

1. What is a Confusion Matrix?

2. Accuracy and Components of Confusion Matrix

3. Precision, Recall and F-Measure

4. Creating a Confusion Matrix by using Python and Sklearn

What is a Confusion Matrix?

A Confusion matrix is the comparison summary of the predicted results and the actual results in any classification problem use case. The comparison summary is extremely necessary to determine the performance of the model after it is trained with some training data.

For a binary classification use case, a Confusion Matrix is a 2×2 matrix which is as shown below

Predicted Class 1 Value EG: 1

Predicted Class 2 Value EG:0

Actual Class 1 Value

EG: 1

TP (True Positive)

FN (False Negative)

Actual Class 2 Value

EG: 0

FP (False Positive)

TN (True Negative)

From the above figure:
We have,

Actual Class 1 value= 1 which is similar to Positive value in a binary outcome.
Actual Class 2 value = 0 which is similar to a negative value in binary outcome.

The left side index of the confusion matrix basically indicates the Actual Values and the top column indicates the Predicted Values.

There are various components that exist when we create a confusion matrix. The components are mentioned below

Positive(P): The predicted result is Positive (Example: Image is a cat)

Negative(N): the predicted result is Negative (Example: Images is not a cat)

True Positive(TP): Here TP basically indicates the predicted and the actual values is 1(True)

True Negative(TN): Here TN indicates the predicted and the actual value is 0(False)

False Negative(FN): Here FN indicates the predicted value is 0(Negative) and Actual value is 1. Here both values do not match. Hence it is False Negative.

False Positive(FP): Here FP indicates the predicted value is 1(Positive) and the actual value is 0. Here again both values mismatches. Hence it is False Positive.

Transform yourself into a highly skilled professional and land a high-paying job with the Artificial Intelligence Course.

Accuracy and Components of Confusion Matrix

After the confusion matrix is created and we determine all the components values, it becomes quite easy for us to calculate the accuracy. So, let us have a look at the components to understand this better.

Classification Accuracy

From the above formula, the sum of TP (True Positive) and the TN (True Negative) are the correct predicted results. Hence in order to calculate the accuracy in percentage, we divide with all the other components. However, there are some problems in the accuracy and we cannot completely depend on it.

Let us consider that our dataset is completely imbalanced. In this Scenario, 98% accuracy can be good or bad based on the problem statement. Hence we have some more key terms which will help us to be sure about the accuracy we calculate. The terms are as given below:

TPR (True Positive Rate) or Sensitivity:

True Positive rate which is also known as Sensitivity measures the percentage of the True Positive with respect to the Total Actual Positives which is indicated by (TP+ FN)

Predicted Class 1 Value EG: 1

Predicted Class 2 Value EG:0

Total

Actual Class 1 Value

EG: 1

TP (True Positive)

FN (False Negative)

Total Actual Positives

Actual Class 2 Value

EG: 0

FP (False Positive)

TN (True Negative)

Total Actual Negatives

TPR= True Positive/ (True Positive + False Negative

TNR (True Negative Rate) or Specificity:

True Negative Rate or Specificity measures the proportion of actual negatives with respect to the Total Negatives

Predicted Class 1 Value EG: 1

Predicted Class 2 Value EG:0

Total

Actual Class 1 Value

EG: 1

TP (True Positive)

FN (False Negative)

Total Actual Positives

Actual Class 2 Value

EG: 0

FP (False Positive)

TN (True Negative)

Total Actual Negatives

TNR= True Negative/ (True Negative+ False Positive)

False Positive Rate(FPR):

False Positive Rate is the percentage of Predicted False Positive (FP) to the Total No of Predicted Positive Results (TP + FP).

	Predicted Class 1 Value EG: 1	Predicted Class 2 Value EG:0
Actual Class 1 Value EG: 1	TP (True Positive)	FN (False Negative)
Actual Class 2 Value EG: 0	FP (False Positive)	TN (True Negative)
	Sum of Total Predicted Positive	Sum of Total Predicted Negative

FPR= False Positive/ (True Positive + False Positive)

False Negative Rate (FNR):

False Negative Rate is the percentage of Predicted False Negative (FP) to the Total No of Predicted Negative Results (TN + FN).

	Predicted Class 1 Value EG: 1	Predicted Class 2 Value EG:0
Actual Class 1 Value EG: 1	TP (True Positive)	FN (False Negative)
Actual Class 2 Value EG: 0	FP (False Positive)	TN (True Negative)
	Sum of Total Predicted Positive	Sum of Total Predicted Negative

FNR= False Negative/ (False Negative + True Negative)

Precision, Recall, and F-Measure

Recall:

A recall is similar to the True Positive Rate and it is the ratio of the Total number of correctly predicted positive values(TP) to all the Positive Values.

Precision:

The Precision basically indicates all the points the model predicted to be positive and what percentage of them are actually Positive.

Precision and Recall are metrics results which focus on the positive class as shown from the above formulas.

F-Measure

So F-Measure is a technique which combines both the Precision and Recall technique and it uses Harmonic Mean in the place of the usual Arithmetic Mean, due to which the extreme values are punished. The F-measure is also called as F1- score and is given by the below formula.

Let us consider an example and see how we can compute the Accuracy, Precision, Recall and the F1-score.

N = 165	Predicted YES	Predicted NO
Actual YES	TP = 150	FN = 10
Actual NO	FP = 20	TN = 100

- Accuracy = (TP + TN) / (TP + TN + FP + FN) = (150 + 100) /(150+100+20+10)= 0.89
- Recall= TP/ (TP+FN) = 150/(150+10) = 0.93
- Precision: TP/(TP+FP)= 150/(150+20) = 0.88

F-measure = (2*Recall*Precision)/(Recall+Presision) = (2*0.93*0.88)/(0.93+0.88) = 0.90

Creating a Confusion Matrix by using Python and Sklearn

Now we will see an example of how we can create a confusion matrix using python along with the sklearn library.

1. Initially, we will create some list of the actual data and the predicted to check the accuracy as shown below


# Python script for confusion matrix creation.

actual_data = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0,1,0,1]
predicted_data = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0,1,0,1]

2. We need to import the confusion matrix from the sklearn library as shown below:


from sklearn.metrics import confusion_matrix

3. Next, we will create the confusion matrix as shown below:


final_results = confusion_matrix(actual_data, predicted_data)

4. Now we can go ahead and calculate the accuracy by importing the library as shown below:


from sklearn.metrics import accuracy_score
accuracy=accuracy_score(actual_data,predicted_data)

5. Finally, we compute the F1-score or F- Measure as shown below:


from sklearn.metrics import classification_report
report=classification_report(actual_data,predicted_data)

Below is the Complete Code:


actual_data = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0,1,0,1]
predicted_data = [0, 1, 1, 1, 0, 0, 1, 0, 1, 0,1,0,1]

from sklearn.metrics import confusion_matrix

final_results = confusion_matrix(actual_data, predicted_data)

print(final_results)

from sklearn.metrics import accuracy_score
accuracy=accuracy_score(actual_data,predicted_data)

from sklearn.metrics import classification_report
report=classification_report(actual_data,predicted_data)

print(accuracy)

print(report)

So, with this, we come to an end of this article. I hope all your Confusion about the Confusion Matrix is now resolved.

Edureka’s Machine Learning with Python certification training helps you gain expertise in various machine learning algorithms such as regression, clustering, decision trees, random forest, Naïve Bayes and Q-Learning. This Machine Learning using Python Training exposes you to concepts of Statistics, Time Series and different classes of machine learning algorithms like supervised, unsupervised and reinforcement algorithms. Throughout the Data Science Certification Course, you’ll be solving real-life case studies on Media, Healthcare, Social Media, Aviation, HR.

Data Science

Confusion Matrix in Machine Learning : Your One Stop Solution

What is a Confusion Matrix?

Accuracy and Components of Confusion Matrix

Precision, Recall, and F-Measure

Creating a Confusion Matrix by using Python and Sklearn

Recommended videos for you

Python Classes – Python Programming Tutorial

Know The Science Behind Product Recommendation With R Programming

The Whys and Hows of Predictive Modeling-II

Machine Learning with Python

Python Loops – While, For and Nested Loops in Python Programming

Application of Clustering in Data Science Using Real-Time Examples

Python List, Tuple, String, Set And Dictonary – Python Sequences

Android Development : Using Android 5.0 Lollipop

Python Programming – Learn Python Programming From Scratch

The Whys and Hows of Predictive Modelling-I

Mastering Python : An Excellent tool for Web Scraping and Data Analysis

Sentiment Analysis In Retail Domain

Diversity Of Python Programming

3 Scenarios Where Predictive Analytics is a Must

Web Scraping And Analytics With Python

Data Science : Make Smarter Business Decisions

Linear Regression With R

Business Analytics Decision Tree in R

Python for Big Data Analytics

Python Tutorial – All You Need To Know In Python Programming

Recommended blogs for you

What is the Format Function in Python and How does it work?

Everything You Need To Know About Matrix In Python

R Programming – Beginners Guide To R Programming Language

A Comprehensive Guide To Random Forest In R

What is the Main Function in Python and how to use it?

Data Science vs Big Data vs Data Analytics

Tutorial on Importing Data in R Commander

Frequently Asked Data Science Interview Questions in 2024

How To Best Utilize Python CGI In Day To Day Coding?

If Else In Python With Examples : Everything You Need To Know

FIFA World Cup 2018 Best XI: Analyzing Fifa Dataset Using Python

Linear Regression Algorithm from Scratch

World Cup 2018: 5 Game Changing Technologies in Football

The Importance of Data Science with Cloud Computing

How To Implement Bayesian Networks In Python? – Bayesian Networks Explained With Examples

Introduction to Classification Algorithms

How to Parse and Modify XML in Python?

Mastering R Is The First Step For A Top-Class Data Science Career

How To Become A Python Developer : Learning Path For Python

Python: Interesting Facts You Need To Know

Join the discussionCancel reply

Trending Courses in Data Science

Data Science and Machine Learning Internship ...

Python Programming Certification Course

Data Science with Python Certification Course

SAS Training and Certification

Statistics Essentials for Analytics

Data Science with R Programming Certification ...

Data Analytics with R Programming Certificati ...

Analytics for Retail Banks

Decision Tree Modeling Using R Certification ...

Advanced Predictive Modelling in R Certificat ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Confusion Matrix in Machine Learning : Your One Stop Solution