Integration of Hadoop with Mongo DB concept

0 votes

Hi I am new to Hadoop and NoSQL technologies. I started learning with world-count program by reading file stored in HDFS and and processing it. Now I want to use Hadoop with MongoDB. Started program from here .

Now here is confusion with me that it stores mongodb data on my local file system, and read data from local file system to HDFS in map/reduce and again write it to mongodb local file system. When I studied HBase, we can configure it to store it's data on HDFS, and hadoop can directly process it on HDFS(map/reduce). How to configure mongodb to store it's data on HDFS.

I think it is better approach to store data in HDFS for fast processing. Not in the local file system. Am I right? Please clear my concept if I am going in wrong direction.

Sep 24, 2018 in Big Data Hadoop by Neha
• 6,280 points
83 views

1 answer to this question.

0 votes
MongoDB isn't built to work on top of HDFS and it's not really necessary since Mongo already has its own approach for scaling horizontally and working with data stored across multiple machines.

A better approach if you need to work with MongoDB and Hadoop is to use MongoDB as the source of your data but process everything in Hadoop (which will use HDFS for any temporary storage). Once your done processing the data you can write it back to MongoDB, S3, or wherever you want.

                                                                OR

HDFS is a distributed file system while HBase is a NoSQL database that uses HDFS as its file system provide a fast and efficient integration with Hadoop that has been prove to work at scale. Being able to work with HBase data directly in Hadoop or push it into HDFS is one of the big advantages when picking HBase as a NoSQL database solution - I don't believe MongoDB provides such tight integration with Hadoop and HDFS which would mitigate any performance and efficiency concerns with moving data from/to a database.

Please look at this blog post for a detailed analysis : https://www.edureka.co/blog/mongodb-the-database-for-big-data-processing/
answered Sep 25, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Best way of starting & stopping the Hadoop daemons with command line

First way is to use start-all.sh & ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,350 points
2,077 views
0 votes
1 answer
0 votes
1 answer

RDMBS integration with Hadoop

About integrating RDBMS with Hadoop, you can ...READ MORE

answered Jul 16 in Big Data Hadoop by Nanda
47 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
3,554 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
441 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
18,216 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
1,321 views
0 votes
1 answer

Can I have a list of property files used in Hadoop Framework?

Here is a complete list of configuration ...READ MORE

answered Aug 14, 2018 in Big Data Hadoop by Frankie
• 9,810 points
73 views