How will you define the number of clusters in a clustering algorithm?

0 votes
For a sample dataset of dimensions n how will you define the number of clusters in a clustering algorithm?
Aug 21, 2018 in Data Analytics by Anmol
• 1,620 points
35 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Though the Clustering Algorithm is not specified, this question is mostly in reference to K-Means clustering where “K” defines the number of clusters. The objective of clustering is to group similar entities in a way that the entities within a group are similar to each other but the groups are different from each other.

For example, the following image shows three different groups. 

Clustering - Data Science Interview Questions - EdurekaWithin Sum of squares is generally used to explain the homogeneity within a cluster. If you plot WSS for a range of number of clusters, you will get the plot shown below.

Clustering Plots - Data Science Interview Questions - Edureka

  • The Graph is generally known as Elbow Curve.
  • Red circled point in above graph i.e. Number of Cluster =6 is the point after which you don’t see any decrement in WSS.
  • This point is known as bending point and taken as K in K – Means.

This is the widely used approach but few data scientists also use Hierarchical clustering first to create dendograms and identify the distinct groups from there.

answered Aug 21, 2018 by ANMOL
• 3,620 points

Related Questions In Data Analytics

0 votes
1 answer

How to count the number of elements with the values in a vector?

You have various options to count the ...READ MORE

answered Apr 12, 2018 in Data Analytics by darklord
• 6,140 points
44 views
0 votes
1 answer

How to write a custom function which will replace all the missing values in a vector with the mean of values in R?

Consider this vector: a<-c(1,2,3,NA,4,5,NA,NA) Write the function to impute ...READ MORE

answered Jul 4, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
61 views
0 votes
1 answer

How can you find total number of null values in a dataset column wise?

You can write a custom sapply function ...READ MORE

answered Oct 12, 2018 in Data Analytics by ANMOL
• 3,620 points
19 views
0 votes
1 answer

How to get the output of number of elements to reach a cumulative sum?

You can use the sapply function, to loop ...READ MORE

answered May 28, 2018 in Data Analytics by darklord
• 6,140 points
31 views
0 votes
2 answers

What is difference between Distributed search head and Search head cluster?

 A distributed environment describes the separation of ...READ MORE

answered Dec 3, 2018 in Data Analytics by Ali
• 10,380 points
112 views
0 votes
2 answers

"Train" and "Test" sets in Data Science

Normally to perform supervised learning you need ...READ MORE

answered Aug 2, 2018 in Data Analytics by ANMOL
• 3,620 points
26 views
0 votes
2 answers

Installing MXNet for R in Windows System

You can install it for python in ...READ MORE

answered Dec 3, 2018 in Data Analytics by Kalgi
• 36,220 points
204 views
+1 vote
3 answers

Problem with installation of Wordcloud in anaconda

Using Anaconda Python 3.6 version For Windows ...READ MORE

answered Aug 7, 2018 in Data Analytics by Priyaj
• 56,140 points
2,135 views
0 votes
1 answer

Define a SQL query? What is the difference between SELECT and UPDATE Query? How do you use SQL in SAS?

Structured query language (SQL) is a programming ...READ MORE

answered Aug 24, 2018 in Data Analytics by ANMOL
• 3,620 points
40 views
0 votes
1 answer

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.