Clustering in R

0 votes
I want to know the difference between K-means clustering and hierarchical clustering.

Can someone please explain to me.
Jul 9, 2018 in Data Analytics by DataKing99
• 8,130 points
70 views

2 answers to this question.

0 votes

A cluster is a group of objects that belong to the same class. Clustering is the process of making a group of abstract objects into classes of similar objects.

The need of clustering in data analysis:

  • Scalability − We need highly scalable clustering algorithms to deal with large databases.
  • Ability to deal with different kinds of attributes − Algorithms should be capable of being applied to any kind of data such as interval-based (numerical) data, categorical, and binary data.
  • Discovery of clusters with attribute shape − The clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find spherical cluster of small sizes.
  • High dimensionality − The clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space.
  • Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.
  • Interpretability − The clustering results should be interpret-able, comprehensible, and usable.

K-MEANS clustering:

K-means clustering is a well-known partitioning method. In this method, objects are classified as belonging to one of K-groups. The results of the partitioning method are a set of K clusters, each object of data set belonging to one cluster. In each cluster, there may be a centroid or a cluster representative. 

Example: A cluster of documents can be represented by a list of those keywords that occur in some minimum number of documents within a cluster. If the number of the clusters is large, the centroids can be further clustered to produce hierarchy within a dataset. K-means is a data mining algorithm which performs clustering of the data samples. In order to cluster the database, K-means algorithm uses an iterative approach.

Hierarchical Clustering:

This method creates a hierarchical decomposition of the given set of data objects. We can classify hierarchical methods on the basis of how the hierarchical decomposition is formed. There are two approaches here:

  1. Agglomerative Approach
  2. Divisive Approach

Agglomerative Approach:

This approach is also known as the bottom-up approach. In this, we start with each object forming a separate group. It keeps on merging the objects or groups that are close to one another. It keeps on doing so until all of the groups are merged into one or until the termination condition holds.

Divisive Approach:

This approach is also known as the top-down approach. In this, we start with all of the objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone.

answered Jul 9, 2018 by CodingByHeart77
• 3,690 points
0 votes

Clustering is a type of unsupervised learning not supervised learning like Classification. In clustering method, objects of the dataset are grouped into clusters, in such a way that groups are very different from each other and the objects in the same group or cluster are very similar to each other. 

Unlike Classification, in which predefined set of classes are presented, but in Clustering there are no predefined set of classes which means that resulting clusters are not known before the execution of the clustering algorithm. 

K-means clustering is a well-known partitioning method. In this objects are classified as belonging to one of K-groups. The results of the partitioning method are a set of K clusters, each object of data set belonging to one cluster. In each cluster, there may be a centroid or a cluster representative. In a case where we consider real-valued data, the arithmetic mean of the attribute vectors for all objects within a cluster provides an appropriate representative; alternative types of centroid may be required in other cases.

Example: A cluster of documents can be represented by a list of those keywords that occur in some minimum number of documents within a cluster. If the number of the clusters is large, the centroids can be further clustered to produces hierarchy within a dataset. K-means is a data mining algorithm which performs clustering of the data samples. As mentioned previously, clustering means the division of a dataset into a number of groups such that similar items falls or belong to the same groups. In order to cluster the database, K-means algorithm uses an iterative approach. 

Hierarchical methods are well-known clustering technique that can be potentially very useful for various data mining tasks. A hierarchical clustering scheme produces a sequence of clusterings in which each clustering is nested into the next clustering in the sequence. Since hierarchical clustering is a greedy search algorithm based on a local search, the merging decision made early in the agglomerative process are not necessarily the right ones. One possible solution to this problem is to refine a clustering produced by the agglomerative hierarchical algorithm to potentially correct the mistakes made early in the agglomerative process. Hierarchical methods are commonly used for clustering in Data Mining. A hierarchical clustering scheme produces a sequence of clusterings in which each clustering is nested into the next clustering in the sequence.

answered Jul 9, 2018 by zombie
• 3,710 points

Related Questions In Data Analytics

0 votes
1 answer

Clustering strings in R

Convert the field as factors and use ...READ MORE

answered Jul 29 in Data Analytics by anonymous
• 32,260 points
20 views
+1 vote
1 answer

Need a hadoop engine in backend to run r server

Dear Koushik, Hope you are doing great. The hadoop ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
76 views
0 votes
2 answers

Installing MXNet for R in Windows System

You can install it for python in ...READ MORE

answered Dec 3, 2018 in Data Analytics by Kalgi
• 46,230 points
440 views
0 votes
2 answers

Transforming a key/value string into distinct rows in R

We would start off by loading the ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,560 points
86 views
0 votes
1 answer

How to cluster a very large dataset in R?

You can initially use kmeans, to calculate ...READ MORE

answered Jun 19, 2018 in Data Analytics by darklord
• 6,190 points
209 views
0 votes
1 answer

How to cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 25, 2018 in Data Analytics by DataKing99
• 8,130 points
82 views
0 votes
1 answer

How to find out cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 27, 2018 in Data Analytics by darklord
• 6,190 points
172 views
0 votes
1 answer

How to find out which version of R is loaded

You can use sessionInfo() to accomplish that. > sessionInfo() R version ...READ MORE

answered Nov 8, 2018 in Data Analytics by Maverick
• 10,040 points
34 views
0 votes
2 answers

How to use group by for multiple columns in dplyr, using string vector input in R?

data = data.frame(   zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),   zbc123qws1 ...READ MORE

answered Aug 5 in Data Analytics by anonymous
3,916 views
0 votes
2 answers

How to sum a variable by group in R?

You can also try this way, x_new = ...READ MORE

answered Jul 31 in Data Analytics by Cherukuri
• 32,260 points
12,004 views