Big Data and Hadoop (168 Blogs) Become a Certified Professional
AWS Global Infrastructure

Big Data

Topics Covered
  • Big Data and Hadoop (144 Blogs)
  • Hadoop Administration (7 Blogs)
  • Apache Storm (4 Blogs)
  • Apache Spark and Scala (29 Blogs)

Brief Introduction to Oozie

Last updated on May 22,2019 13.4K Views

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs such as Java MapReduce, Streaming MapReduce, Pig, Hive and Sqoop. Oozie is a scalable, reliable and extensible system. Oozie is used in production at Yahoo!, running more than 200,000 jobs every day.

Features of Oozie:

  • Execute and monitor workflows in Hadoop
  • Periodic scheduling of workflows
  • Trigger execution of data availability
  • HTTP and command line interface and web console

Oozie Workflow – Directed Acyclic Graph of Jobs:

Oozie Workflow Example:

Oozie Workflow Example

<workflow-app nome='wordcount –wf’>
 <start to= ‘wordcount’/>
<action name=’Wordcount'>
 <value> ${outputDir}</value>
<ok to='end’/>
 <error to='kill'/>
<kill name='kill'/>
<end name='end'/>

Workflow Definition:

A workflow definition is a DAG with control flow nodes or action nodes, where the nodes are connected by transitions arrows.

Control Flow Nodes:     

The control flow provides a way to control the Workflow execution path. Flow control operations within the workflow applications can be done through the following nodes:

  • Start/end/kill
  • Decision
  • Fork/join

Action Nodes:

  • Map-reduce
  • Pig
  • HDFS
  • Sub-workflow
  • Java – Run custom Java code

Oozie Workflow Application:

Workflow application is a ZIP file that includes the workflow definition and the necessary files to run all the actions. It contains the following files:

  • Configuration file – config-default.xml
  • App files – lib/ directory with JAR and SO files
  • Pig scripts

Application Deployment:

$ hadoop fs-put wordcount-wf hdfs://

Workflow Job Parameters:

$ cat job.properites

Job Execution:

$ oozie job –run –config

Got a question for us? Mention them in the comments section and we will get back to you. 

Related Posts:

Big Data and Hadoop Training

Why Learn Hadoop?

Hadoop 2.0 FAQs

Introduction to Hadoop 2.0


Browse Categories

webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.