What happens when a datanode that is dead becomes active again

Question

If a data node which became dead becomes active, how does Hadoop destroy one extra replication of data which has come up? Is the data from this now active node deleted or data from any other replication is deleted?

score 0 · Answer 1 · Jun 20, 2019

When NameNode notices that it has not received a heartbeat message from a datanode after a certain amount of time (usually 10 minutes by default), the data node is marked as dead. Since blocks will be under-replicated, the system begins replicating the blocks that were stored on the dead DataNode.

The NameNode replicates the data blocks from one DataNode to another. The replication data transfer happens directly between DataNode and the data never passes through the Name Node.

After the dead Datanode again comes back to the cluster then it is the case of Over Replicated blocks. HDFS will automatically delete the excess replicas as the default replication factor has to be maintained 3. The replica from the now active datanode is going to be removed.