Understanding K-means Clustering with Examples | Edureka

Data Science (26 Blogs)

K-Means is one of the most important algorithms when it comes to Machine learning Certification Training. In this blog, we will understand the K-Means clustering algorithm with the help of examples.

A Hospital Care chain wants to open a series of Emergency-Care wards within a region. We assume that the hospital knows the location of all the maximum accident-prone areas in the region. They have to decide the number of the Emergency Units to be opened and the location of these Emergency Units, so that all the accident-prone areas are covered in the vicinity of these Emergency Units.

The challenge is to decide the location of these Emergency Units so that the whole region is covered. Here is when K-means Clustering comes to rescue!

Before getting to K-means Clustering, let us first understand what Clustering is.

A cluster refers to a small group of objects. Clustering is grouping those objects into clusters. In order to learn clustering, it is important to understand the scenarios that lead to cluster different objects. Let us identify a few of them.

What is Clustering?

Clustering is dividing data points into homogeneous classes or clusters:

Points in the same group are as similar as possible
Points in different group are as dissimilar as possible

When a collection of objects is given, we put objects into group based on similarity.

Application of Clustering:

Clustering is used in almost all the fields. You can infer some ideas from Example 1 to come up with lot of clustering applications that you would have come across.

Listed here are few more applications, which would add to what you have learnt.

Clustering helps marketers improve their customer base and work on the target areas. It helps group people (according to different criteria’s such as willingness, purchasing power etc.) based on their similarity in many ways related to the product under consideration.
Clustering helps in identification of groups of houses on the basis of their value, type and geographical locations.
Clustering is used to study earth-quake. Based on the areas hit by an earthquake in a region, clustering can help analyse the next probable location where earthquake can occur.

Clustering Algorithms:

A Clustering Algorithm tries to analyse natural groups of data on the basis of some similarity. It locates the centroid of the group of data points. To carry out effective clustering, the algorithm evaluates the distance between each point from the centroid of the cluster.

The goal of clustering is to determine the intrinsic grouping in a set of unlabelled data.

What is K-means Clustering?

K-means (Macqueen, 1967) is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. K-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.

K-means Clustering – Example 1:

A pizza chain wants to open its delivery centres across a city. What do you think would be the possible challenges?

They need to analyse the areas from where the pizza is being ordered frequently.
They need to understand as to how many pizza stores has to be opened to cover delivery in the area.
They need to figure out the locations for the pizza stores within all these areas in order to keep the distance between the store and delivery points minimum.

Resolving these challenges includes a lot of analysis and mathematics. We would now learn about how clustering can provide a meaningful and easy method of sorting out such real life challenges. Before that let’s see what clustering is.

K-means Clustering Method:

If k is given, the K-means algorithm can be executed in the following steps:

Partition of objects into k non-empty subsets
Identifying the cluster centroids (mean point) of the current partition.
Assigning each point to a specific cluster
Compute the distances from each point and allot points to the cluster where the distance from the centroid is minimum.
After re-allotting the points, find the centroid of the new cluster formed.

The step by step process:

Now, let’s consider the problem in Example 1 and see how we can help the pizza chain to come up with centres based on K-means algorithm.

K Means Clustering Algorithm | K Means Example in Python | Machine Learning Algorithms | Edureka

Within the video you will learn the concepts of K-Means clustering and its implementation using python.

Similarly, for opening Hospital Care Wards:

K-means Clustering will group these locations of maximum prone areas into clusters and define a cluster center for each cluster, which will be the locations where the Emergency Units will open. These Clusters centers are the centroids of each cluster and are at a minimum distance from all the points of a particular cluster, henceforth, the Emergency Units will be at minimum distance from all the accident prone areas within a cluster.

Here is another example for you, try and come up with the solution based on your understanding of K-means clustering.

K-means Clustering – Example 2:

Let’s consider the data on drug-related crimes in Canada. The data consists of crimes due to various drugs that include, Heroin, Cocaine to prescription drugs, especially by underage people. The crimes resulted due to these substance abuse can be brought down by starting de-addiction centres in areas most afflicted by this kind of crime. With the available data, different objectives can be set. They are:

Classify the crimes based on the abuse substance to detect prominent cause.
Classify the crimes based on age groups.
Analyze the data to determine what kinds of de-addiction centre is required.
Find out how many de-addiction centres need to be setup to reduce drug related crime rate.

The K-means algorithm can be used to determine any of the above scenarios by analyzing the available data.

Following the K-means Clustering method used in the previous example, we can start off with a given k, following by the execution of the K-means algorithm.

Mathematical Formulation for K-means Algorithm:

D= {x₁,x₂,…,x_i,…,x_m} à data set of m records

x_i= (x_i1,x_i2,…,x_in) à each record is an n-dimensional vector

Finding Cluster Centers that Minimize Distortion:

Solution can be found by setting the partial derivative of Distortion w.r.t. each cluster center to zero.

For any k clusters, the value of k should be such that even if we increase the value of k from after several levels of clustering the distortion remains constant. The achieved point is called the “Elbow”.

This is the ideal value of k, for the clusters created.

Related Post:

Application of Clustering in Data Science Using real-time examples.

Recommended videos for you

Android Development : Using Android 5.0 Lollipop

Data Science : Make Smarter Business Decisions

Business Analytics Decision Tree in R

Business Analytics with R

application-of-clustering-in-data-science-using-real-time-examples.jpg

Application of Clustering in Data Science Using Real-Time Examples

Linear Regression With R

Python-Class-Python-Classes-Python-Programming-Python-Tutorial-Edureka.jpeg

Python Classes – Python Programming Tutorial

Python-Programming-Learn-Python-Python-Tutorial-Python-Training-Edureka.jpeg

Python Programming – Learn Python Programming From Scratch

Python-Loops-Tutorial-Python-For-Loop-While-Loop-Python-Python-Training-Edureka.jpeg

Python Loops – While, For and Nested Loops in Python Programming

Python-NumPy-Tutorial-NumPy-Array-Python-Tutorial-For-Beginners-Python-Training-Edureka.jpeg

Python Numpy Tutorial – Arrays In Python

Python for Big Data Analytics

The Whys and Hows of Predictive Modeling-II

mastering-python-an-excellent-tool-for-web-scraping-and-data-analysis.jpg

Mastering Python : An Excellent tool for Web Scraping and Data Analysis

Sentiment Analysis In Retail Domain

Web Scraping And Analytics With Python

Python-Lists-Python-Tuples-Python-Sets-Dictionary-Python-Strings-Python-Training-Edureka.jpeg

Python List, Tuple, String, Set And Dictonary – Python Sequences

Python-Machine-Learning-Tutorial-Machine-Learning-Algorithms-Python-Training-Edureka.jpeg

Machine Learning with Python

Diversity Of Python Programming

The Whys and Hows of Predictive Modelling-I

Introduction to Business Analytics with R

Recommended blogs for you

Data Analytics Projects: 9 Project Ideas for Your Portfolio

Golang vs Python: Which One To Choose?

Hash Tables and Hashmaps in Python: What are they and How to implement?

Bias-Variance-In-Machine-Learning-blog-image-300x175.jpg

What Is Bias-Variance In Machine Learning?

Different Job Titles for Data Scientists

Data Science Modeling: Key Steps and Best Practices

Data-Science-And-Machine-Learning-For-Non-Programmers-300x175.png

Data Science And Machine Learning For Non-Programmers

Data Scientist vs Data Analyst vs Data Engineer : Role, Skills, & More

World Cup 2018: 5 Game Changing Technologies in Football

Regular Expression in Python With Example

ClickStream Data for Analytics

Top 10 Reasons Why You Should Learn Python

How To Best Implement Armstrong Number In Python?

Python Classes And Objects – Object Oriented Programming

Top 11 Programming Languages for Data Scientists in 2026

How to Implement a Linked List in Python?

Difference Between Data Scientist and Data Analyst

How to Implement Python Libraries

A Comprehensive Guide To Random Forest In R

How to Read CSV File in Python?

Comments

8 Comments

Toc says:
May 23, 2018 at 3:56 pm GMT
Hi everyone, I have an eyetracking dataset and want to use it to predict group membership. So given x and y coordinates, can I predict whether someone is a male or female. I haven’t used K-Cluster algorithm before and was wondering if it can be used and how, to answer my question. Thank you for your response.
Toc
Reply
r.kasthuri says:
Sep 16, 2017 at 8:00 am GMT
Sir wil u please provide me kmean mapreduce in r
Reply
krupa jain says:
Jun 7, 2016 at 4:49 pm GMT
what is the difference between plain and iterative mapreduce?
Reply
bob ama says:
Aug 19, 2015 at 8:02 am GMT
“If k is given, the K-means algorithm can be executed in the following steps” but you don’t say where “k” in ‘if k is given’ comes from.
Reply
- yogesh indani says:
  Feb 19, 2016 at 5:20 am GMT
  but k is the number of clusters how can u say in data set
  Reply
rahul says:
Nov 4, 2014 at 12:28 pm GMT
thanks
Reply
- EdurekaSupport says:
  Nov 12, 2014 at 6:53 am GMT
  You are welcome, Rahul!! Please check out other posts as well.
  Reply
sumit says:
Nov 4, 2014 at 11:18 am GMT
nice one
Reply

Join the discussionCancel reply

REGISTER FOR FREE WEBINAR

webinar_success

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Understanding K-means Clustering with Examples

edureka.co