Published on Sep 15,2014
Email Post

Euclidean Similarity calculates the distance between two users and then it tries to find out the similarity. This makes sense if you think of users as points when there are many dimensions (as many dimensions as the items), whose coordinates are preference values. This similarity metric calculates the Euclidean Distance (d) between two such user points. If you look at User 1, the distance is calculated as 0, because for this particular user the distance is 0. Similarity will be calculated using the formula, 1/1+d, where d is the distance. As the table shows for User 1, where distance is 0, similarity will be:

1/1+0= 1

For User 2:

Similarity = 1/1+3.937

= 0.203

This is how it calculates similarity between two users.

Cosine Similarity

The Cosine Measure Similarity is another similarity metric that depends on envisioning user preferences as points in space. The Cosine Measure Similarity is commonly referenced in research in collaborative filtering.

If you go into the Mahout framework and try to find out the cosine similarity, you won’t find it there. If you see mathematically, the way it is calculated is related to Pearson’s formula. So in Mahout, you can implement this similarity metric by simply using Pearson’s CorrelationSimilarity, and it will show you the angle between two vectors. Based on the angle, it will tell you how close these two items are?

Euclidean Distance

As the above diagram shows, the Euclidean Distance will cover the direct distance between the vectors, or the co-ordinate points. But here, we are just concerned about the angles. Don’t be surprised if you are not able to find Cosine Similarity in Mahout.

Got a question for us?? Please mention them in the comments section and we will get back to you.

Related Posts

Introduction to Clustering in Mahout

ClickStream Data for Analytics

Head-start Machine Learning with Mahout

Fuzzy K-Means Clustering in Mahout

About Author
Published on Sep 15,2014

Share on

Browse Categories