Hierarchical Clustering

Question

I have read some resources and I found out how hierarchical clustering works. However, when I compare it with k-means clustering, it seems to me that k-means really constitutes specific number of clusters,whereas hierarchical analysis shows me how the samples can be clustered. What I mean is that I do not get a specific number of clusters in hierarchical clustering. I get only a scheme about how the clusters can be constituted and portion of relation between the samples.

Thus, I cannot understand where I can use this clustering method.

Nandini · Answer 1 · Feb 2, 2022

K-means and Hierarchical clustering are both Clustering Algorithms, based on distance metrics, i.e, they use distance-based methods to cluster data points based on their similarity.

The way the clusters are formed differs with both the algorithm

In K-means clustering, k is user-defined here k also acts as a hyperparameter that one needs to figure out, and for this some foresight into the data is required. A good understanding of the dataset is required when working with k-means.

Whereas, in Hierarchical clustering especially in agglomerative clustering all the data points are considered as individual clusters and then based on similarity metric the clusters are merged and this process repeats until a single cluster is obtained. In Hierarchical clustering the prior knowledge of clusters is not required,one need not worry about outliers as they can be easily visualized using dendrograms (which are used to represent the Hierarchical clustering).

Applications of Hierarchical Clustering

1) Used in Taxonomy, biological classification of animal or plant kingdom.

2) Tracking viral Outbreaks: Hierarchical Clustering is used to track the virus and their sources, this is useful as it gives scientists understanding of the virus source, origin of outbreak; (why and how the outbreak began, potentially saving lives.

3) Evolution through Phylogenetic trees: to find how different species relate to each other: for this DNA sequencing and hierarchical clustering is used together. DNA sequences of the species are generated then similarity in DNA is found by calculating the distance between the sequences, Based on this phylogenetic tree is constructed.

4) Clustering Crimes sites in the city, understanding the trends in the data, and accordingly data- driven actions can be taken. Need for more strict laws in clusters having high number of murders, assaults or rape cases.

Hierarchical clustering is suitable for smaller data set, data related to fields like biology, scientific research, market research, gene segmentation, understanding crime in cities

Thus, to establish relationships, figure out the connections among data points, finding the similarity we use Hierarchical clustering.It helps to make data-driven and strategic decisions.