How to install and configure a multi-node Hadoop cluster?

0 votes

I am new to Big Data & Hadoop. I have 4 commodity grade PC, which I am planning for setting up a Multi Node Hadoop Cluster. I will install Linux on them. So, I would like to keep 1 master machine and 3 slave machines. Can you suggest me which Operating system should I use & how to setup a Hadoop multi node cluster using them?
 

Mar 21, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
588 views

1 answer to this question.

0 votes

I would recommend you to install Cent OS on all of your machines

I am giving you an example of two machines (master and slave) with IP:

Master IP: 192.168.56.102

Slave IP: 192.168.56.103

STEP 1: Check the IP address of all machines.

Command: ip addr show (you can use the ifconfig command as well)

STEP 2: Disable the firewall restrictions.

Command: service iptables stop

Command: sudo chkconfig iptables off

STEP 3: Open hosts file to add master and data node with their respective IP addresses.

Command: sudo nano /etc/hosts

192.168.56.102 master

192.168.56.103 slave1

Same properties will be displayed in the master and slave hosts files.

STEP 4: Restart the sshd service.

Command: service sshd restart

STEP 5: Create the SSH Key in the master node.

Command: ssh-keygen -t rsa -P “”

STEP 6: Copy the generated ssh key to master node’s authorized keys.

Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

STEP 7: Copy the master node’s ssh key to slave’s authorized keys.

Command: ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@slave

STEP 8: Download the Java 8 Package. Save this file in your home directory.

STEP 9: Extract the Java Tar File on all nodes.

Command: tar -xvf jdk-8u101-linux-i586.tar.gz

STEP 10: Download the Hadoop 2.7.3 Package on all nodes.

Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz

STEP 11: Extract the Hadoop tar File on all nodes.

Command: tar -xvf hadoop-2.7.3.tar.gz

STEP 12: Add the Hadoop and Java paths in the bash file (.bashrc) on all nodes.

Open. bashrc file. Now, add Hadoop and Java Path as shown below:

Command:  sudo gedit .bashrc

export HADOOP_HOME=/usr/lib/hadoop-2.7.3

export HADOOP_CONF_DIR=/usr/lib/hadoop-2.7.3/etc/hadoop

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-2.7.3

export HADOOP_COMMON_HOME=/usr/lib/hadoop-2.7.3

export HADOOP_HDFS_HOME=/usr/lib/hadoop-2.7.3

export YARN_HOME=/usr/lib/hadoop-2.7.3

export PATH=$PATH:/usr/lib/hadoop-2.7.3/bin

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_144

export PATH=$JAVA_HOME/bin:$PATH

Then, save the bash file and close it.

For applying all these changes to the current Terminal, execute the source command.

Command: source .bashrc

To make sure that Java and Hadoop have been properly installed on your system and can be accessed through the Terminal, execute the java -version and hadoop version commands.

Commandjava -version

Commandhadoop version

STEP 13: Create masters file and edit in both master and slave machines in /etc/Hadoop directory:

Command: sudo gedit masters

STEP 14: Edit slaves file in master machine as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/slaves

STEP 15: Edit slaves file in slave machine as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/slaves

STEP 16: Edit core-site.xml on both master and slave machines as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://master:9000</value>

</property>

</configuration>

STEP 7: Edit hdfs-site.xml on master as follows:
Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/hadoop-2.7.3/namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/hadoop-2.7.3/datanode</value>

</property>

</configuration>

STEP 18: Edit hdfs-site.xml on slave machine as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/hadoop-2.7.3/datanode</value>

</property>

</configuration>

STEP 19: Copy mapred-site from the template in configuration folder and the edit mapred-site.xml on both master and slave machines as follows:

Command: cp mapred-site.xml.template mapred-site.xml

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

STEP 20: Edit yarn-site.xml on both master and slave machines as follows:

Command: sudo gedit /home/hadoop/hadoop-2.7.3/etc/hadoop/yarn-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

STEP 21: Format the namenode (Only on master machine).

Command: hadoop namenode -format

STEP 22: Start all daemons (Only on master machine).

Command: ./sbin/start-all.sh

STEP 23: Check all the daemons running on both master and slave machines.

Command: jps

At last, open the browser and go to master:50070/dfshealth.html on your master machine, this will give you the NameNode interface. 

answered Mar 21, 2018 by Shubham
• 13,290 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Can anyone help me in installing and configuring a Multi-Node Hadoop Cluster?

To install Hadoop setup on the 4-node ...READ MORE

answered Jun 4 in Big Data Hadoop by ravikiran
• 4,200 points
37 views
0 votes
1 answer

How to access different directories in a Hadoop cluster?

You need to configure the client to ...READ MORE

answered Sep 18, 2018 in Big Data Hadoop by nitinrawat895
• 10,490 points
57 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
2,325 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
11,960 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
881 views
0 votes
1 answer
0 votes
1 answer

How to run example codes of Hadoop Definitive Guide book?

You will find multiple git repositories where ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by Shubham
• 13,290 points
103 views
0 votes
1 answer

How can I download hadoop documentation for a specific version?

You can go through this SVN link:- ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by Shubham
• 13,290 points
75 views