Brief Introduction to Oozie

Workflow Example:

<workflow-app nome='wordcount –wf’> <start to= ‘wordcount’/> <action name=’Wordcount'> <map-reduce> <job-tracker>foo.com:9001</job-tracker> <name-node>hdfs://bar.com:9000</name-node> <configuration> <property> <name>mapred.input.dir</name> <value>${inputDir}</value,> </property> <property> <name>mapred.output.dir</name> <value> ${outputDir}</value> </property> </configuration> </map-reduce> <ok to='end’/> <error to='kill'/> </action> <kill name='kill'/> <end name='end'/> </Workflow-app>

Workflow Definition:

A workflow definition is a DAG with control flow nodes or action nodes, where the nodes are connected by transitions arrows.

Control Flow Nodes:

The control flow provides a way to control the Workflow execution path. Flow control operations within the workflow applications can be done through the following nodes:

Start/end/kill

Decision

Fork/join

Action Nodes:

Map-reduce

Pig

HDFS

Sub-workflow

Java – Run custom Java code

Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
1. Prepare SQL to be run on using CRON
2. See below for example of code which needs to be added to SQL code for a cron job
.logon server/user_id, Teradata password
For example :
.logon Mozart/akatarni,Welcome1
ADD THE SQL CODE HERE
.logoff
.quit
.exit
3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
b. Give login id and SAS password
c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
i. Left window shows your personal computer and right one is server
4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
6. To open the editor :
a. Type export EDITOR=vi <hit enter>
b. Type crontab -e <hit enter>
i. This command edits your crontab file, or create one if it doesn’t already exist.
c. Press “i” to start typing
d. Press <ESC> to get out of insert mode
7. Then make the cron job entry:
A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 06:00 hours every day
In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 20:15 hours every Sunday
https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
8. Keep adding lines to the crontab file to schedule more job.
a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
click <shift> + O (case sensitive). This adds a new line above the current one.
9. To move around the file, in ESC mode
“l” – move right
“h” – move left
“j” – move down
“k” – move up
10. To save the crontab file and exit, press <ESC>, then :wq
a. To exit the file WITHOUT saving, press <ESC>, the :q!
11. Type Exit at the Unix prompt to exit Putty.
12. The cron job should run at the specified time
13. Check the *.LOG file to make sure code ran successfully.
Hope this helps. Cheers!

Comments

5 Comments

Rajiv says:
Dec 31, 2016 at 11:14 am GMT
sir how to schedule job using crontab
- EdurekaSupport says:
  Jan 4, 2017 at 2:35 pm GMT
  Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
  1. Prepare SQL to be run on using CRON
  2. See below for example of code which needs to be added to SQL code for a cron job
  .logon server/user_id, Teradata password
  For example :
  .logon Mozart/akatarni,Welcome1
  ADD THE SQL CODE HERE
  .logoff
  .quit
  .exit
  3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
  a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
  b. Give login id and SAS password
  c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
  i. Left window shows your personal computer and right one is server
  4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
  https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
  5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
  https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
  6. To open the editor :
  a. Type export EDITOR=vi <hit enter>
  b. Type crontab -e <hit enter>
  i. This command edits your crontab file, or create one if it doesn’t already exist.
  c. Press “i” to start typing
  d. Press <ESC> to get out of insert mode
  7. Then make the cron job entry:
  A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
  00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 06:00 hours every day
  In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
  15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 20:15 hours every Sunday
  https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
  8. Keep adding lines to the crontab file to schedule more job.
  a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
  click <shift> + O (case sensitive). This adds a new line above the current one.
  9. To move around the file, in ESC mode
  “l” – move right
  “h” – move left
  “j” – move down
  “k” – move up
  10. To save the crontab file and exit, press <ESC>, then :wq
  a. To exit the file WITHOUT saving, press <ESC>, the :q!
  11. Type Exit at the Unix prompt to exit Putty.
  12. The cron job should run at the specified time
  13. Check the *.LOG file to make sure code ran successfully.
  Hope this helps. Cheers!
  - Rajiv says:
    Jan 4, 2017 at 3:18 pm GMT
    sir thanks for giving answer to my question..its helpful form me…good and fine description..thanks to u sir
Sankalp Tomar says:
Aug 19, 2016 at 12:55 pm GMT
Hi,
Suppose we want to use the output of Hive Job as an input to Mapreduce Job. How can we achieve this??
- EdurekaSupport says:
  Jan 5, 2017 at 7:31 am GMT
  Hey Sankalp, thanks for checking out our blog. With regard to your query, first we can store the output of hive in hdfs and then we can execute it as an input file for mapreduce code.
  Storing the output of hive.
  INSERT OVERWRITE DIRECTORY ‘/path/to/output/dir’
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ‘,’
  select books from table;
  Hope this helps. Cheers!

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

Brief Introduction to Oozie

Features:

Workflow – Directed Acyclic Graph of Jobs:

Workflow Example:

Workflow Definition:

Workflow Application:

Application Deployment:

Workflow Job Parameters:

Job Execution:

Recommended videos for you

Distributed Cache With MapReduce

Reduce Side Joins With MapReduce

Hadoop Tutorial – A Complete Tutorial For Hadoop

Tailored Big Data Solutions Using MapReduce Design Patterns

Big Data Tutorial – Get Started With Big Data And Hadoop

Apache Spark Will Replace Hadoop ! Know Why

5 Things One Must Know About Spark

New-Age Search through Apache Solr

What Is Hadoop – All You Need To Know About Hadoop

Streaming With Apache Spark and Scala

Is Hadoop A Necessity For Data Science?

Logistic Regression In Data Science

Hadoop for Java Professionals

Big Data Processing with Spark and Scala

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

What is Big Data and Why Learn Hadoop!!!

When not to use Hadoop

What is Apache Storm all about?

Administer Hadoop Cluster

Big Data – XML Parsing With MapReduce

Recommended blogs for you

Why Scala is getting Popular?

Demystifying Partitioning in Spark

Apache Pig Installation on Linux

Is Big Data the Right Move for You?

Top Hive Commands with Examples in HQL

Apache Hadoop 2.0 and YARN

Top Hadoop Interview Questions On Apache PIG For 2025

Brief Introduction to Oozie

A Deep Dive Into Pig

How essential is Hadoop Training?

Hadoop Cluster Configuration Files

Top Apache Kafka Interview Questions To Prepare In 2025

Hadoop Developer-Job Responsibilities & Skills

Spark Streaming Tutorial – Sentiment Analysis Using Apache Spark

Install Apache Hadoop Cluster on Amazon EC2 free tier Ubuntu server in 30 minutes

Apache Hive Installation on Ubuntu

How To Install MongoDB on Mac Operating System?

How Predictive Analysis can Help you Combat Employee Attrition

Applying Hadoop with Data Science

Install Puppet – Install Puppet in Four Simple Steps

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric DP-700 Certification Trainin ...

PySpark Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Big Data Hadoop Certification Training Course

Apache Kafka Certification Training Course

ELK Stack Training & Certification

Apache Spark and Scala Certification Training ...

Splunk Certification Training: Power User and ...