Want to have an idea on Hadoop Machine Learning and Data Mining project

0 votes
I am a graduate CS student (Data mining and machine learning) and have a good exposure to core Java (>4 years). I have read up a bunch of stuff on Hadoop and Map/Reduce

I would now like to do a project on this stuff (over my free time of course) to get a better understanding.

Any good project ideas would be really appreciated. I just wanna do this to learn, so I don't really mind re-inventing the wheel. Also, anything related to data mining/machine learning would be an added bonus (fits with my research) but absolutely not necessary.
Aug 13, 2018 in Big Data Hadoop by Neha
• 6,300 points

1 answer to this question.

0 votes

You haven't written anything about your interest. I know algorithms in graph mining has been implemented over the Hadoop framework. This software http://www.cs.cmu.edu/~pegasus/ and paper: "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations" may give you a starting point.

Further, this link discusses something similar to your question: http://atbrox.com/2010/02/08/parallel-machine-learning-for-hadoopmapreduce-a-python-example/ but it is in python. And, there is a very good paper by Andrew Ng "Map-Reduce for Machine Learning on Multicore".

There was a NIPS 2009 workshop on the similar topic "Large-Scale Machine Learning: Parallelism and Massive Datasets". You can browse some of the paper and get an idea.

Edit : Also there is Apache Mahout http://mahout.apache.org/ -->" Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm"

answered Aug 13, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to analyze block placement on datanodes and rebalancing data across Hadoop nodes?

HDFS provides a tool for administrators i.e. ...READ MORE

answered Jun 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
0 votes
2 answers

Hey for all, how to get on large data i want use in hadoop?

Hi, To work with Hadoop you can also ...READ MORE

answered Jul 30, 2019 in Big Data Hadoop by Sunny
0 votes
1 answer

Explain to me the method to transfer data between Azure tables and Hadoop on Azure

this article on HiveStorageHandler will let you create ...READ MORE

answered May 2, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
0 votes
1 answer
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,810 points
0 votes
1 answer

I want to install snappy on Hadoop 1.2.1. How do I do that?

As per Cloudera, if you install hadoop ...READ MORE

answered Dec 11, 2018 in Big Data Hadoop by Frankie
• 9,810 points