I could not understand the how the distance between the nodes became 0, 2, 4, 6.

As per the definitive guide,

For example, imagine a node n1 on rack r1 in data center d1. This can be represented as /d1/r1/n1. Using this notation, here are the distances for the four scenarios:

• distance(/d1/r1/n1, /d1/r1/n1) = 0 (processes on the same node)

• distance(/d1/r1/n1, /d1/r1/n2) = 2 (different nodes on the same rack)

• distance(/d1/r1/n1, /d1/r2/n3) = 4 (nodes on different racks in the same data center)

• distance(/d1/r1/n1, /d2/r3/n4) = 6 (nodes in different data centers).

• distance(/d1/r1/n1, /d2/r3/n10) = ?

What is the network distance?

Let's imagine your cluster as a tree with the following levels:

• Abstract global root (Top or root)
• Data centers (1st level)
• Racks (2nd level)
• Nodes (3rd level or leaves)

If we draw this tree there should be something like this: Let's count distance between any circle and its parent as 1.

Then the distance between any two circles is the sum of their distance to their closest common ancestor or 0 for the same node.

So it's always 6 for any two nodes in different data centers (like between /d1/r1/n1 and /d2/r4/n10).

OR

"The distance between two nodes is the sum of their distances to their closest common ancestor" (Hadoop: The Definitive Guide 4th ed, page 70)

distance (/d1/r1/n1, /d2/r3/n10) = 6

The common ancestor between two nodes is /

so the distance from n1 to / is 3

and the distance from n10 to / is 3

the total is 6

