Why minimum 3 Journal Nodes are required in Hadoop HA architecture

Question

I have installed Hadoop in multi-distributed mode. I need high availability (HA architecture) for my cluster. So, I am planning to set up HA using Quorum Journals. While going through the official documentation I found, there must be at least 3 JournalNode daemons. Can anyone help me in understanding why we need 3 Journal Nodes.

kurt_cobain · Answer 1 · Apr 20, 2018

Initially in Hadoop 1.x, the NameNode was the single point of failure and once the NameNode goes down the cluster goes down. This is the reason why, Hadoop 2.x has High Availability architecture, where there are 2 NameNodes where one NameNode is the active NameNode & other one is the passive NameNode.

To make the cluster highly available, both the NameNode should be in sync. So, for this Journal Node was introduced. Journal Node are the ones which will perform the synchronisation activities between Active & Passive NameNode.

Now imagine a situation where the JournalNode fails. The whole purpose of the High availability fails. Again, the Journal Node will become single point of failure.

More than half of the total journal nodes should be healthy and running. In case of 2 journal node, more than half means both the journal node should be up & running. So, you cannot bear any node failure in this situation.

Thus, the minimum number of nodes is 3 suggested, as it can handle Journal Node failure.