How to analyze block placement on datanodes and rebalancing data across Hadoop nodes?

0 votes
I want to see that how block placement on datanodes and rebalancing data happens across Hadoop nodes? What is the case when it is required?
Jun 21, 2018 in Big Data Hadoop by Shubham
• 13,290 points
112 views

1 answer to this question.

0 votes

HDFS provides a tool for administrators i.e. BALANCER that analyzes block placement and rebalances data across the Datanode. This can be done manually by giving command (bin/Hadoop balancer) or can be set to run when disk usage reaches a particular percentage (bin/Hadoop balancer –threshold  % ).
 

BALANCER moves blocks from over utilized nodes to under utilized nodes, thus making sure that data is evenly distributed across nodes. It ensures balanced data density across cluster.
 

When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism. If a data node fails, data needs to be re replicated on existing nodes which might cause data density to be higher on some nodes.

Hope it will answer your query to some extent.

answered Jun 21, 2018 by nitinrawat895
• 10,670 points

Related Questions In Big Data Hadoop

0 votes
1 answer
0 votes
1 answer

Want to have an idea on Hadoop Machine Learning and Data Mining project.

You haven't written anything about your interest. ...READ MORE

answered Aug 13, 2018 in Big Data Hadoop by Frankie
• 9,810 points
144 views
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,810 points
87 views
0 votes
1 answer

How to find the running namenodes and secondary name nodes in hadoop?

Name nodes: hdfs getconf -namenodes Secondary name nodes: hdfs getconf ...READ MORE

answered Nov 26, 2018 in Big Data Hadoop by Omkar
• 67,460 points
74 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
980 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,689 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,020 points
82 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,383 views
0 votes
1 answer

How to install Hadoop on Ubuntu?

You can refer to this blog by ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
335 views
0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
112 views