PGP AI and ML NITW (28 Blogs) Become a Certified Professional

Fuzzy K-Means Clustering in Mahout

Last updated on Nov 15,2022 9K Views


Fuzzy K-Means is exactly the same algorithm as K-means, which is a popular simple clustering technique. The only difference is, instead of assigning a point exclusively to only one cluster, it can have some sort of fuzziness or overlap between two or more clusters. Following are the key points, describing Fuzzy K-Means:

  • Unlike K-Means, which seeks hard cluster, wherein each of the points belongs to one cluster, Fuzzy K-Means seeks the softer clusters for overlapping.
  • A single point in a soft cluster can belong to more than one cluster with a certain affinity value towards each of the points.
  • The affinity is in proportion with the distance of that point from the cluster centroid.
  • Similar to K-Means, Fuzzy K-Means works on the objects that have the distance measure defined and can be represented in the n-dimensional vector space.

Fuzzy K-Means MapReduce Flow

There’s not a lot of difference between the MapReduce flow of K-Means and Fuzzy K-Means. The implementation of both in Mahout is similar.

Following are the essential parameters for the implementation of Fuzzy K-Means:

  • You need a Vector data set for input.
  • There has to be the RandomSeedGenerator to seed the initial k clusters.
  • For distance measure SquaredEuclideanDistanceMeasure is required.
  • A large value of convergence threshold, such as –cd 1.0, if the squared value of the distance measure has been used
  • A value for maxIterations; the default value is -x 10.
  • The coefficient of normalization or the fuzziness factor, with a value greater than -m 1.0

Got a question for us? Mention them in the comments section and we will get back to you.

Related Posts

Understanding K-Means Clustering with Examples

Supervised Learning in Apache Mahout

Machine Learning with Mahout

Comments
3 Comments
    • EdurekaSupport says:

      Hi Amir, you are facing this error due to Mahout dependencies mismatch. You should use Mahout 5 API to run this program along with Hadoop jars.
      Hope this helps!

Join the discussion

Browse Categories

webinar REGISTER FOR FREE WEBINAR
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.

image not found!
image not found!

Fuzzy K-Means Clustering in Mahout

edureka.co