How to run Map Reduce program using Ubuntu terminal

Question

My hadoop path is /usr/local/hadoop and jar comprised in /usr/local/hadoop/share along with java 7. please help me to figure out the problem and JAVA_HOME=/ust/lib/jvm/jdk-7-amd64

Neha · Answer 1 · Aug 7, 2018

I used the following steps to execute it over the terminal. My system is Ubuntu 14.04 LTS....

follow this step..

Compilation Process for MapReduce 

--> STEP 1. start hadoop.

$ start-all.sh

--> STEP 2. Check all components of Hadoop whether it is ready or not.

$ jps

--> STEP 3. Assuming environment variables are set as follows:

export JAVA_HOME=/usr/java/default          <comment : Dont worry if you have other version of java instead of default.>
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar  <comment: this is MOST IMPORTANT tool file. Make sure you have. If you didnt find it                                dont worry its having different location on your PC.>

--> STEP 4. Yepppiii...now copy the code of to the home directory. Make one note 'Its not nessesory to store our code onto HDFS file'.

--> STEP 5. Now its time to compile our main code. Fire below command

$ javac -classpath <hadooop-core.jar file> -d <Your New Directory>/ <sourceCode.java>

Meaning of this command :
*Its simply compile your Java source file that is sourceCode.java.
*Required <hadoop-core.jar file must contain all libraries mention in your source code. Here I suggest you some file version and their location address.

http://www.java2s.com/Code/Jar/h/Downloadhadoop0201devcorejar.htm

in this link at below you get download link. its name is hadoop-0.20.1-dev-core.jar.zip. Download it and extract it. It generate one 'jar' file. Which is Most Important while compiling. In above command <hadooop-core.jar file> file is this generated .jar file.

* -d option create a directory for you and store all class file into it.

--> STEP 6. Mapreduce code consist of three main component 1. Mapper class 2. Driver Class 3. Reducer Class.
so its focusable that we create one jar file which contains three component's class defination.

so fire below command to generate jar file.

$ jar -cvf <File you have to create> -C <Directory you have obtained in previous command> .

* Remember at the last dot '.' is must its stands for all contains.
* option -c for create new archive
  option -v for generate verbose output on standard output
  option -f for  specify archive file name


for example..

$ javac -classpath hadoop-0.20.1-dev-core.jar -d LineCount/ LineCount.java  : we create LineCount/ directory here.
$ jar -cvf LineCount.jar -C LineCount/ .                    : here LineCount.jar is our jar file which creating here and                                            LineCount/ is my directory.


-->STEP 7. Now its tym to run your code on hadoop framework.
make sure you put your input files on your hdfs alredy. If not then add them using

$ hadoop fs -put <source file path> /input


-->STEP 8. Now run your program using ur Jar file.

$ hadoop jar <your jar file> <directory name without /> /input/<your file name> /output/<output file name>

for example..

if my jar file is test.jar,
directory I was created is test/
my input file is /input/a.txt
and I want entire output on output/test then my command will be.

$ hadoop jar test.jar test /input/a.txt /output/test

--> STEP 9. Wow your so lucky that upto now you crosses thousand of error bridge where others programmers are still stuck.

after successfully completion of your program /output directory create two files for you.

one is _SUCCESS for completion and programs log information.
second one is part-r-00000 which is context file containing respective output.

read it using..

$ hadoop fs -cat /output/<your file>/part-r-00000


IMPORTANT NOTES :

1.  If you get auxService error while creating job then make sure your yarn that is resource manager must contain auxilliary services configuration. If its not then add following piece of line to your yarn-site.xml file.

Its location is.. /usr/local/hadoop/etc/hadoop

copy this..and paste to yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

2. If your get error for Job.getInstance while running code over hadoop. Its just because hadoop cannot create job instance on that moment for you so simply replace your jobInstance statement with 

Job job = new Job(configurationObject,"Job Dummy Name");


References:
https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using-map-reduce-framework
https://sites.google.com/site/hadoopandhive/home/how-to-run-and-compile-a-hadoop-program
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core