How to analyze block placement on datanodes and rebalancing data across Hadoop nodes

0 votes
I want to see that how block placement on datanodes and rebalancing data happens across Hadoop nodes? What is the case when it is required?
Jun 21, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,000 views

1 answer to this question.

0 votes

HDFS provides a tool for administrators i.e. BALANCER that analyzes block placement and rebalances data across the Datanode. This can be done manually by giving command (bin/Hadoop balancer) or can be set to run when disk usage reaches a particular percentage (bin/Hadoop balancer –threshold  % ).
 

BALANCER moves blocks from over utilized nodes to under utilized nodes, thus making sure that data is evenly distributed across nodes. It ensures balanced data density across cluster.
 

When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism. If a data node fails, data needs to be re replicated on existing nodes which might cause data density to be higher on some nodes.

Hope it will answer your query to some extent.

answered Jun 21, 2018 by nitinrawat895
• 11,380 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Explain to me how to transfer data between Azure tables and Hadoop on Azure

I shall redirect you to a link ...READ MORE

answered Jul 4, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
736 views
0 votes
1 answer

Want to have an idea on Hadoop Machine Learning and Data Mining project.

You haven't written anything about your interest. ...READ MORE

answered Aug 14, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,351 views
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,802 views
0 votes
1 answer

How to find the running namenodes and secondary name nodes in hadoop?

Name nodes: hdfs getconf -namenodes Secondary name nodes: hdfs getconf ...READ MORE

answered Nov 26, 2018 in Big Data Hadoop by Omkar
• 69,220 points
2,862 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,642 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,076 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,090 points
1,207 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,071 views
0 votes
1 answer

How to install Hadoop on Ubuntu?

You can refer to this blog by ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,520 views
0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
2,120 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP