Help m setting up a multi node hadoop cluster

0 votes
I am a fresher to Big Data systems having completed a few Coursera certifications. I plan to have my own personal Hadoop cluster using 4 commodity grade PCs. at present all run Windows, but I am ok to install Linux on them. I searched a lot on the internet for the setup process but found none(found many to spin on AWS). At this time, I am not restricted to any platform but would like all the tech to be free \ open source. With 4 PCs I can have 1 master node and other 3 data nodes. Would appreciate detailed steps (at least the broad contours) on how to spin this bare metal Hadoop cluster
Jun 19 in Big Data Hadoop by nitinrawat895
• 10,110 points
9 views

1 answer to this question.

0 votes

I can help you on this one.

Requirement: 1 master 3 slaves (installation of hadoop setup on multiple node cluster)


Step 1: Get rid of windows. Currently Hadoop is available for Linux machines. You can have ubuntu 14.04 or later versions (or CentOS, Redhat etc)

Step 2: Install and setup Java $ sudo apt-get install python-software-properties $ sudo add-apt-repository ppa:ferramroberto/java $ sudo apt-get update $ sudo apt-get install sun-java6-jdk

# Select Sun's Java as the default on your machine.
# See 'sudo update-alternatives --config java' for more information.    
#
$ sudo update-java-alternatives -s java-6-sun

Step 3: Set the path in .bashrc file (open this file using text editor(vi/nano) and append the below text)

export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=PATH:$JAVA_HOME/bin

Step 4: Add a dedicated user (While that’s not required it is recommended)

# useradd hadoop 
# passwd hadoop

Step 5: Edit hosts file in /etc/ folder on all nodes, specify the IP address of each system followed by their host names.( open the file in using vi /etc/hosts and append the text below --

<ip address of master node> hadoop-master 
<ip address of slave node 1> hadoop-slave-1 
<ip address of slave node 2> hadoop-slave-2
<ip address of slave node 3> hadoop-slave-3

Step 6: Setup ssh in every node such that they can communicate with one another without any prompt for password.

$ su hadoop
$ ssh-keygen -t rsa 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-master 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp3@hadoop-slave-3
$ chmod 0600 ~/.ssh/authorized_keys 
$ exit

for more information on SSH go to : [https://www.ssh.com/ssh/][1]

Step 7: In master server download and install Hadoop.

# mkdir /opt/hadoop 
# cd /opt/hadoop/ 
# wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-
  1.2.0.tar.gz 
# tar -xzf hadoop-1.2.0.tar.gz 
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop 
# cd /opt/hadoop/hadoop/

Installation is finished here!

Next step is : Configuring Hadoop

Step 1: Open core-site.xml and edit it as below :

<configuration>
<property> 
  <name>fs.default.name</name> 
  <value>hdfs://hadoop-master:9000/</value> 
</property> 
<property> 
  <name>dfs.permissions</name> 
  <value>false</value> 
</property> 
</configuration>

Step 2: open hdfs-site.xml and edit it as below :

<configuration>
<property> 
  <name>dfs.data.dir</name> 
  <value>/opt/hadoop/hadoop/dfs/name/data</value> 
  <final>true</final> 
</property> 

<property> 
  <name>dfs.name.dir</name> 
  <value>/opt/hadoop/hadoop/dfs/name</value> 
  <final>true</final> 
</property> 
 <property> 
  <name>dfs.name.dir</name> 
  <value>/opt/hadoop/hadoop/dfs/name</value> 
  <final>true</final> 
</property> 

<property> 
  <name>dfs.replication</name> 
  <value>3</value> 
</property> 
</configuration>

Step 3: open mapred-site.xml and edit --

<configuration>
<property> 
  <name>mapred.job.tracker</name> 
  <value>hadoop-master:9001</value> 
</property> 
</configuration>

Step 4: Append below text in hadoop-env.sh

export JAVA_HOME=/opt/jdk1.7.0_17 export 
HADOOP_OPTS=Djava.net.preferIPv4Stack=true export 
HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf

Step 5: Configure master --

$ vi etc/hadoop/masters 
hadoop-master

Step 5: Install it on slave nodes as well --

# su hadoop 
$ cd /opt/hadoop 
$ scp -r hadoop hadoop-slave-1:/opt/hadoop 
$ scp -r hadoop hadoop-slave-2:/opt/hadoop
$ scp -r hadoop hadoop-slave-3:/opt/hadoop

Step 6: Configure slaves --

$ vi etc/hadoop/slaves
hadoop-slave-1 
hadoop-slave-2
hadoop-slave-3

Step 7: format the nodes (ONLY ONE TIME OTHERWISE ALL THE DATA WILL BE LOST PERMANENTLY)

# su hadoop 
$ cd /opt/hadoop/hadoop 
$ bin/hadoop namenode –format

You are all set!!

You can start the services as follows --

$ cd $HADOOP_HOME/sbin
$ start-all.s
answered Jun 19 by ravikiran
• 3,560 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to install and configure a multi-node Hadoop cluster?

I would recommend you to install Cent ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by Shubham
• 13,190 points
547 views
0 votes
1 answer

Hadoop single node cluster set up issues

As far as the error, there's a ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
63 views
0 votes
1 answer

Different ports in a Hadoop cluster environment?

Below image will help you in understanding ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
88 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

How to run Map Reduce program using Ubuntu terminal?

 I used the following steps to execute it ...READ MORE

answered Aug 7, 2018 in Big Data Hadoop by Neha
• 6,280 points
77 views
0 votes
1 answer

Copy files to all Hadoop DFS directories

Hi @Bhavish. There is no Hadoop command ...READ MORE

answered Feb 23 in Big Data Hadoop by Omkar
• 67,120 points
176 views
0 votes
1 answer

Can anyone help me in installing and configuring a Multi-Node Hadoop Cluster?

To install Hadoop setup on the 4-node ...READ MORE

answered Jun 4 in Big Data Hadoop by ravikiran
• 3,560 points
25 views