POC for Hadoop in real time scenario

Question

I have a bit of a problem. I want to learn about Hadoop and how I might use it to handle data streams in real time. As such I want to build a&#160;meaningful POC&#160;around it so that I can showcase it when I have to prove my knowledge of it in front of some potential employer or to introduce it in my present firm.I'd also want to mention that I am limited in hardware resources. Just my laptop and me :) I know the basics of Hadoop and have written 2-3 basic MR jobs. I want to do something more meaningful or real world.Please suggest.

Frankie · Answer

I'd like to point a few things.If you want to do a POC with just 1 laptop, there's little point in using Hadoop.Also, as said by other people, Hadoop is not designed for realtime application, because there is some overhead in running Map/Reduce jobs.That being said, Cloudera released&#160;Impala&#160;which works with the Hadoop ecosystem (specifically the Hive metastore) to achieve realtime performance. Be aware that to achieve this, it does not generate Map/Reduce jobs, and is currently in beta, so use it carefully.So I would really advise going at Impala so you can still use an Hadoop ecosystem, but if you're also considering alternatives here are a few other frameworks that could be of use:Druid&#160;: was open-sourced by MetaMarkets. Looks interesting, even though I've not used it myself.Storm&#160;: no integration with HDFS, it just processes data as it comes.Yahoo S4&#160;: seems pretty close to Storm.In the end I think you should really analyze your needs, and see if using Hadoop is what you need, because it's only getting started in the realtime space. There are several other projects which could help you achieve realtime performance.Her are some examples:Finance/InsuranceClassify investment opportunities as good or not e.g. based on industry/company metrics, portfolio diversity and currency risk.Classify credit card transactions as valid or invalid based e.g. location of transaction and credit card holder, date, amount, purchased item or service, history of transactions and similar transactions.Biology/MedicineClassification of proteins into structural or functional classesDiagnostic classification, e.g. cancer tumours based on imagesInternetDocument Classification and RankingMalware classification, email/tweet/web spam classificationProduction Systems (e.g. in energy or petrochemical industries)Classify and detect situations (e.g. sweet spots or risk situations) based on realtime and historic data from sensors

POC for Hadoop in real time scenario

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

How to create a project for the first time in Hadoop.?

Real time access to data in hadoop

What Distributed Cache is actually used for in Hadoop?

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

Which is the Real Time Monitoring tool/API for Hadoop?

How compression works in Hadoop?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES