How to analyze block placement on datanodes and rebalancing data across Hadoop nodes?

0 votes
I want to see that how block placement on datanodes and rebalancing data happens across Hadoop nodes? What is the case when it is required?
Jun 21, 2018 in Big Data Hadoop by Shubham
• 12,790 points
82 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

HDFS provides a tool for administrators i.e. BALANCER that analyzes block placement and rebalances data across the Datanode. This can be done manually by giving command (bin/Hadoop balancer) or can be set to run when disk usage reaches a particular percentage (bin/Hadoop balancer –threshold  % ).
 

BALANCER moves blocks from over utilized nodes to under utilized nodes, thus making sure that data is evenly distributed across nodes. It ensures balanced data density across cluster.
 

When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism. If a data node fails, data needs to be re replicated on existing nodes which might cause data density to be higher on some nodes.

Hope it will answer your query to some extent.

answered Jun 21, 2018 by nitinrawat895
• 9,350 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Want to have an idea on Hadoop Machine Learning and Data Mining project.

You haven't written anything about your interest. ...READ MORE

answered Aug 13, 2018 in Big Data Hadoop by Frankie
• 9,710 points
67 views
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,710 points
48 views
0 votes
1 answer

How to find the running namenodes and secondary name nodes in hadoop?

Name nodes: hdfs getconf -namenodes Secondary name nodes: hdfs getconf ...READ MORE

answered Nov 26, 2018 in Big Data Hadoop by Omkar
• 66,880 points
41 views
0 votes
1 answer

Hey for all, how to get on large data i want use in hadoop?

Hey! You can get large data-sets for ...READ MORE

answered Apr 24 in Big Data Hadoop by Ariba
29 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
647 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,350 points
1,825 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 1,980 points
49 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,031 views
0 votes
1 answer

How to install Hadoop on Ubuntu?

You can refer to this blog by ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 9,350 points
269 views
0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 9,350 points
40 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.