Want to have an idea on Hadoop Machine Learning and Data Mining project

Question

I am a graduate CS student (Data mining and machine learning) and have a good exposure to core Java (>4 years). I have read up a bunch of stuff on Hadoop and Map/Reduce

I would now like to do a project on this stuff (over my free time of course) to get a better understanding.

Any good project ideas would be really appreciated. I just wanna do this to learn, so I don't really mind re-inventing the wheel. Also, anything related to data mining/machine learning would be an added bonus (fits with my research) but absolutely not necessary.

Frankie · Answer 1 · Aug 14, 2018

You haven't written anything about your interest. I know algorithms in graph mining has been implemented over the Hadoop framework. This software http://www.cs.cmu.edu/~pegasus/ and paper: "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations" may give you a starting point.

There was a NIPS 2009 workshop on the similar topic "Large-Scale Machine Learning: Parallelism and Massive Datasets". You can browse some of the paper and get an idea.

Edit : Also there is Apache Mahout http://mahout.apache.org/ -->" Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm"