Published on Feb 22,2017
Email Post

Apache Mahout is a machine learning project developed by Apache Software Foundation. It produces the implementations of algorithms meant for collaborative filtering, clustering and classification. Let’s see what all are the uses, specific to these algorithms with the following machine learning use cases.


We all have often seen this behavior from YouTube that whenever one logs in and watches a video, after the completion of that video, gets recommendations for other videos, there and then, based on the video that has been watched.

YouTube does lots of recommendations and the key purpose behind this is to provide a better user-experience, make the user stay for long in their application, evaluate it further, and get more conversions as well. If you look at retailers or e-commerce applications, they do lots of recommendations exclusively. All those recommendations are purely targeted to get more and more conversions. YouTube utilizes recommendation systems to bring videos to a user that it believes the user will be interested in. The recommendations here are designed to:

  • Increase the number of videos the users will watch
  • Increase the length of time he spends on the site, and
  • Making his YouTube experience enjoyable to the maximum level

How it works?

To obtain personalized recommendations, YouTube’s recommendation system combines the related videos association rules with the user’s personal activity on the site. Based upon the personal or the historical activity of the user, YouTube gives better recommendations.

Following are the several factors that YouTube relies upon, but not limited to:

  • There is a history of videos watched, along with a certain threshold, like by a certain date. After all, you don’t want to count the videos watched from 2 years ago.
  • Also, YouTube emphasizes the videos that are explicitly liked, added to favourites, given a good rating, or added to playlist. All these videos together are known as a seed set.
  • Then, to compute the candidate recommendations for a seed set, YouTube expands it to the related videos.

YouTube mostly does item-based recommendations. It actually leverages on the item-to-item recommendations. Similar to that, there are many other platforms, which depend on recommendations.

Wine Recommendation

More than 2 million consumers everyday look for the answer to the question: What wine will I enjoy?


  • Mysterious ratings and adjective-based, verbose reviews do little help in deciding which wine should be bought.
  • They can’t even agree amongst themselves.


  • Next Glass solves this problem by removing subjectivity and applying science to deliver recommendations based on your previous ratings.

Next Glass is a third party company. The purpose of Next Glass is to recommend the users, the next glass of wine they should try out. It does recommendations in an entirely different way. There are several third-party companies that give recommendations for wine. But here, the way recommendations happen is, the next glass takes all the DNA samples of millions of wine flavors. They, then match the DNA flavors with the users’ historical information and make recommendations based on the matching DNA samples of different flavors of wine.

There are other third parties also, like Wine Enthusiast, Wine Spectator, Robert Parket, which do wine recommendations; though each of these companies might be following different ways or patterns to make recommendations.

Fraud Detection

Now, this is one of the data mining aspects of machine learning. Fraud detection is another domain, which gets benefited by machine learning. If you look at credit card fraud, there are different ways people do it these days. It could be phishing, through which they fraudulently steal all the credentials of the users and then illegally use them, or they can do ‘carding’, with which they can generate the simulated cards and other credentials as well through various techniques.

Credit card fraud also has different meanings, like from the consumers’ side, and from the provider’s side. There are different ways, credit card fraud can happen. Again, they can leverage on classification systems to identify such credit card fraud. The classification system or a classifier can be useful in predicting which transactions can later be determined as fraudulent based on the known examples of behavior for purchases and other credit card transactions.

News Clustering

For clustering, there are different algorithms available in Mahout. Specific to clustering, we may look at the basic ones, most advanced ones, and also to the probabilistic clustering technique in Mahout. If we look at the news websites, they cluster news articles based on the content provided within the news article, and that is where it leverages upon clustering.

Got a question for us? Mention them in the comments section and we will get back to you. 

Related Posts:

Supervised Learning in Apache Mahout

Fuzzy K-Means Clustering in Mahout

Head-start Machine Learning with Mahout

About Author
Published on Feb 22,2017

Share on

Browse Categories