Use different distance formula other than euclidean distance in k means

0 votes
I am working with latitude-longitude data.

My objective is to make clusters based on the distance between two points.

Now distance between two different point is =ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371

How to use k means in R. Is there any way I can override distance calculation in that process?
Jun 21, 2018 in Data Analytics by DataKing99
• 8,250 points
1,949 views

1 answer to this question.

0 votes

K-means is based on variance minimization. The sum-of-variance formula equals the sum of squared Euclidean distances, but the converse, for other distances, will not hold.

If you want to have a k-means like an algorithm for other distances (where the mean is not an appropriate estimator), use k-medoids (PAM). In contrast to k-means, k-medoids will converge with arbitrary distance functions!

For Manhattan distance, you can also use K-medians. The median is an appropriate estimator for L1 norms (the median minimizes the sum-of-differences; the mean minimizes the sum-of-squared-distances).

For your particular use case, you could also transform your data into 3D space, then use (squared) Euclidean distance and thus k-means. But your cluster centers will be somewhere underground!

answered Jun 21, 2018 by Sahiti
• 6,370 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
1 answer

In a dpylr pipline how to use sample and seq?

For avoiding rowwise(), I prefer to use ...READ MORE

answered Apr 6, 2018 in Data Analytics by DeepCoder786
• 1,720 points

edited Jun 9, 2020 by Gitika 1,490 views
0 votes
2 answers

How to use group by for multiple columns in dplyr, using string vector input in R?

data = data.frame(   zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),   zbc123qws1 ...READ MORE

answered Aug 6, 2019 in Data Analytics by anonymous
14,559 views
+1 vote
2 answers

Which function can I use to clear the console in R and RStudio ?

Description                   Windows & Linux           Mac Clear console                      Ctrl+L ...READ MORE

answered Apr 17, 2018 in Data Analytics by anonymous
88,943 views