Hierarchical Clustering

0 votes
I have read some resources and I found out how hierarchical clustering works. However, when I compare it with k-means clustering, it seems to me that k-means really constitutes specific number of clusters,whereas hierarchical analysis shows me how the samples can be clustered. What I mean is that I do not get a specific number of clusters in hierarchical clustering. I get only a scheme about how the clusters can be constituted and portion of relation between the samples.

Thus, I cannot understand where I can use this clustering method.
Feb 2 in Machine Learning by Dev
• 6,000 points

1 answer to this question.

0 votes

K-means and Hierarchical clustering are both Clustering Algorithms, based on distance metrics, i.e, they use distance-based methods to cluster data points based on their similarity.

The way the clusters are formed differs with both the algorithm

In K-means clustering, k is user-defined here k also acts as a hyperparameter that one needs to figure out, and for this some foresight into the data is required. A good understanding of the dataset is required when working with k-means.

Whereas, in Hierarchical clustering especially in agglomerative clustering all the data points are considered as individual clusters and then based on similarity metric the clusters are merged and this process repeats until a single cluster is obtained. In Hierarchical clustering the prior knowledge of clusters is not required,one need not worry about outliers as they can be easily visualized using dendrograms (which are used to represent the Hierarchical clustering).

Applications of Hierarchical Clustering

1) Used in Taxonomy, biological classification of animal or plant kingdom.

2) Tracking viral Outbreaks: Hierarchical Clustering is used to track the virus and their sources, this is useful as it gives scientists understanding of the virus source, origin of outbreak; (why and how the outbreak began, potentially saving lives.

3) Evolution through Phylogenetic trees: to find how different species relate to each other: for this DNA sequencing and hierarchical clustering is used together. DNA sequences of the species are generated then similarity in DNA is found by calculating the distance between the sequences, Based on this phylogenetic tree is constructed.

4) Clustering Crimes sites in the city, understanding the trends in the data, and accordingly data- driven actions can be taken. Need for more strict laws in clusters having high number of murders, assaults or rape cases.

Hierarchical clustering is suitable for smaller data set, data related to fields like biology, scientific research, market research, gene segmentation, understanding crime in cities

Thus, to establish relationships, figure out the connections among data points, finding the similarity we use Hierarchical clustering.It helps to make data-driven and strategic decisions.

answered Feb 2 by Nandini
• 5,480 points

Related Questions In Machine Learning

0 votes
1 answer

What is clustering in Machine Learning?

Clustering is a type of unsupervised learning ...READ MORE

answered May 10, 2019 in Machine Learning by Shridhar
0 votes
1 answer

What are different clustering methods?

Different clustering methods include: 1. Density-Based Methods: These methods ...READ MORE

answered May 10, 2019 in Machine Learning by Vishal
0 votes
1 answer

Clustering algorithms

These are the clustering algorithms that are ...READ MORE

answered May 10, 2019 in Machine Learning by Nisha
0 votes
1 answer

DBSCAN algorithm and clustering algorithm for data mining

You can use any distance function with ...READ MORE

answered Mar 4 in Machine Learning by Dev
• 6,000 points
0 votes
1 answer

Use different distance formula other than euclidean distance in k means

K-means is based on variance minimization. The sum-of-variance formula ...READ MORE

answered Jun 21, 2018 in Data Analytics by Sahiti
• 6,360 points
0 votes
1 answer

Overfitting vs Underfitting

In statistics and machine learning, one of ...READ MORE

answered Jul 11, 2018 in Data Analytics by CodingByHeart77
• 3,720 points
+1 vote
1 answer

How to handle Nominal Data?

Nominal data is basically data which can ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
+2 votes
2 answers

How to handle outliers

There are multiple ways to handle outliers ...READ MORE

answered Jul 24, 2018 in Data Analytics by Abhi
• 3,720 points
0 votes
1 answer

Hierarchical clustering of 1 million objects

Consider switching the algorithm instead of using ...READ MORE

answered Feb 24 in Machine Learning by Nandini
• 5,480 points
Send OTP
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP