How to run Map Reduce program using Ubuntu terminal?

0 votes

My hadoop path is /usr/local/hadoop and jar comprised in /usr/local/hadoop/share along with java 7. please help me to figure out the problem and JAVA_HOME=/ust/lib/jvm/jdk-7-amd64

Aug 7, 2018 in Big Data Hadoop by Frankie
• 9,810 points
78 views

1 answer to this question.

0 votes

 I used the following steps to execute it over the terminal. My system is Ubuntu 14.04 LTS....

follow this step..

Compilation Process for MapReduce 

--> STEP 1. start hadoop.

$ start-all.sh

--> STEP 2. Check all components of Hadoop whether it is ready or not.

$ jps

--> STEP 3. Assuming environment variables are set as follows:

export JAVA_HOME=/usr/java/default          <comment : Dont worry if you have other version of java instead of default.>
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar  <comment: this is MOST IMPORTANT tool file. Make sure you have. If you didnt find it                                dont worry its having different location on your PC.>

--> STEP 4. Yepppiii...now copy the code of to the home directory. Make one note 'Its not nessesory to store our code onto HDFS file'.

--> STEP 5. Now its time to compile our main code. Fire below command

$ javac -classpath <hadooop-core.jar file> -d <Your New Directory>/ <sourceCode.java>

Meaning of this command :
*Its simply compile your Java source file that is sourceCode.java.
*Required <hadoop-core.jar file must contain all libraries mention in your source code. Here I suggest you some file version and their location address.

http://www.java2s.com/Code/Jar/h/Downloadhadoop0201devcorejar.htm

in this link at below you get download link. its name is hadoop-0.20.1-dev-core.jar.zip. Download it and extract it. It generate one 'jar' file. Which is Most Important while compiling. In above command <hadooop-core.jar file> file is this generated .jar file.

* -d option create a directory for you and store all class file into it.

--> STEP 6. Mapreduce code consist of three main component 1. Mapper class 2. Driver Class 3. Reducer Class.
so its focusable that we create one jar file which contains three component's class defination.

so fire below command to generate jar file.

$ jar -cvf <File you have to create> -C <Directory you have obtained in previous command> .

* Remember at the last dot '.' is must its stands for all contains.
* option -c for create new archive
  option -v for generate verbose output on standard output
  option -f for  specify archive file name


for example..

$ javac -classpath hadoop-0.20.1-dev-core.jar -d LineCount/ LineCount.java  : we create LineCount/ directory here.
$ jar -cvf LineCount.jar -C LineCount/ .                    : here LineCount.jar is our jar file which creating here and                                            LineCount/ is my directory.


-->STEP 7. Now its tym to run your code on hadoop framework.
make sure you put your input files on your hdfs alredy. If not then add them using

$ hadoop fs -put <source file path> /input


-->STEP 8. Now run your program using ur Jar file.

$ hadoop jar <your jar file> <directory name without /> /input/<your file name> /output/<output file name>

for example..

if my jar file is test.jar,
directory I was created is test/
my input file is /input/a.txt
and I want entire output on output/test then my command will be.

$ hadoop jar test.jar test /input/a.txt /output/test

--> STEP 9. Wow your so lucky that upto now you crosses thousand of error bridge where others programmers are still stuck.

after successfully completion of your program /output directory create two files for you.

one is _SUCCESS for completion and programs log information.
second one is part-r-00000 which is context file containing respective output.

read it using..

$ hadoop fs -cat /output/<your file>/part-r-00000


IMPORTANT NOTES :

1.  If you get auxService error while creating job then make sure your yarn that is resource manager must contain auxilliary services configuration. If its not then add following piece of line to your yarn-site.xml file.

Its location is.. /usr/local/hadoop/etc/hadoop

copy this..and paste to yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

2. If your get error for Job.getInstance while running code over hadoop. Its just because hadoop cannot create job instance on that moment for you so simply replace your jobInstance statement with 

Job job = new Job(configurationObject,"Job Dummy Name");


References:
https://dataheads.wordpress.com/2013/11/21/hadoop-2-setup-on-64-bit-ubuntu-12-04-part-1/
https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using-map-reduce-framework
https://sites.google.com/site/hadoopandhive/home/how-to-run-and-compile-a-hadoop-program
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core
answered Aug 7, 2018 by Neha
• 6,280 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to run mapreduce program in terminal?

You can reference the below steps: Step 1: ...READ MORE

answered Jan 31 in Big Data Hadoop by Srishti
114 views
0 votes
1 answer

How to set the number of Map & Reduce tasks?

The map tasks created for a job ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,190 points
45 views
0 votes
1 answer

How to pass large records to map/reduce tasks?

Hadoop is not designed for records about ...READ MORE

answered Sep 25, 2018 in Big Data Hadoop by Frankie
• 9,810 points
57 views
0 votes
1 answer

How do I compile my java program on Ubuntu such that it will refer to hadoop-2.2.0 libraries?

The simplest solution for Linux machines would ...READ MORE

answered Oct 29, 2018 in Big Data Hadoop by Frankie
• 9,810 points
35 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Copy files to all Hadoop DFS directories

Hi @Bhavish. There is no Hadoop command ...READ MORE

answered Feb 23 in Big Data Hadoop by Omkar
• 67,140 points
176 views
0 votes
1 answer

Hadoop installation on Ubuntu.

I have the perfect solution for you, ...READ MORE

answered Mar 7 in Big Data Hadoop by nitinrawat895
• 10,150 points
61 views
+2 votes
1 answer

How to calculate Maximum salary of the employee with the name using the Map Reduce Technique

Please try the below code and it ...READ MORE

answered Jul 25, 2018 in Big Data Hadoop by Neha
• 6,280 points
312 views
0 votes
1 answer

How to use Sqoop in Java Program?

You can run sqoop from inside your ...READ MORE

answered Nov 19, 2018 in Big Data Hadoop by Neha
• 6,280 points
46 views