I am working with latitude-longitude data.

My objective is to make clusters based on the distance between two points.

Now distance between two different point is =ACOS(SIN(lat1)*SIN(lat2)+COS(lat1)*COS(lat2)*COS(lon2-lon1))*6371

How to use k means in R. Is there any way I can override distance calculation in that process?
Jun 21, 2018 1,082 views

## 1 answer to this question.

K-means is based on variance minimization. The sum-of-variance formula equals the sum of squared Euclidean distances, but the converse, for other distances, will not hold.

If you want to have a k-means like an algorithm for other distances (where the mean is not an appropriate estimator), use k-medoids (PAM). In contrast to k-means, k-medoids will converge with arbitrary distance functions!

For Manhattan distance, you can also use K-medians. The median is an appropriate estimator for L1 norms (the median minimizes the sum-of-differences; the mean minimizes the sum-of-squared-distances).

For your particular use case, you could also transform your data into 3D space, then use (squared) Euclidean distance and thus k-means. But your cluster centers will be somewhere underground!

• 6,380 points

## How to use a function to repeat a set of procedures on specific set of columns in a data frame?

You can parse the strings to symbols. ...READ MORE

## In a dpylr pipline how to use sample and seq?

For avoiding rowwise(), I prefer to use ...READ MORE

## How to use group by for multiple columns in dplyr, using string vector input in R?

data = data.frame(   zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),   zbc123qws1 ...READ MORE

+1 vote

## Which function can I use to clear the console in R and RStudio ?

Description                   Windows & Linux           Mac Clear console                      Ctrl+L ...READ MORE

+1 vote

## k means vs KNN

K-means clustering is basically an unsupervised clustering ...READ MORE

+1 vote

## How to handle Nominal Data?

Nominal data is basically data which can ...READ MORE

## How to handle outliers

There are multiple ways to handle outliers ...READ MORE

## On a given dataset would time taken to train n - random forest be equal to time taken to train n X (Decision tree)

No, the time to train the random ...READ MORE

+1 vote