Integration of Hadoop with Mongo DB concept

0 votes

Hi I am new to Hadoop and NoSQL technologies. I started learning with world-count program by reading file stored in HDFS and and processing it. Now I want to use Hadoop with MongoDB. Started program from here .

Now here is confusion with me that it stores mongodb data on my local file system, and read data from local file system to HDFS in map/reduce and again write it to mongodb local file system. When I studied HBase, we can configure it to store it's data on HDFS, and hadoop can directly process it on HDFS(map/reduce). How to configure mongodb to store it's data on HDFS.

I think it is better approach to store data in HDFS for fast processing. Not in the local file system. Am I right? Please clear my concept if I am going in wrong direction.

Sep 25, 2018 in Big Data Hadoop by Neha
• 6,300 points

1 answer to this question.

0 votes
MongoDB isn't built to work on top of HDFS and it's not really necessary since Mongo already has its own approach for scaling horizontally and working with data stored across multiple machines.

A better approach if you need to work with MongoDB and Hadoop is to use MongoDB as the source of your data but process everything in Hadoop (which will use HDFS for any temporary storage). Once your done processing the data you can write it back to MongoDB, S3, or wherever you want.


HDFS is a distributed file system while HBase is a NoSQL database that uses HDFS as its file system provide a fast and efficient integration with Hadoop that has been prove to work at scale. Being able to work with HBase data directly in Hadoop or push it into HDFS is one of the big advantages when picking HBase as a NoSQL database solution - I don't believe MongoDB provides such tight integration with Hadoop and HDFS which would mitigate any performance and efficiency concerns with moving data from/to a database.

Please look at this blog post for a detailed analysis :
answered Sep 25, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Best way of starting & stopping the Hadoop daemons with command line

First way is to use & ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,490 points
0 votes
1 answer

What are some of the famous visualization tools which can be integrated with Hadoop & Hive?

I have personally used two visualization tools ...READ MORE

answered May 1, 2018 in Big Data Hadoop by coldcode
• 2,080 points
0 votes
1 answer

RDMBS integration with Hadoop

About integrating RDBMS with Hadoop, you can ...READ MORE

answered Jul 16, 2019 in Big Data Hadoop by Nanda
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
0 votes
1 answer

Can I have a list of property files used in Hadoop Framework?

Here is a complete list of configuration ...READ MORE

answered Aug 14, 2018 in Big Data Hadoop by Frankie
• 9,830 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP