Pig Programming | Create Your First Apache Pig Script

Big Data and Hadoop (165 Blogs) Become a Certified Professional

Pig Programming: Create Your First Apache Pig Script

In our Hadoop Tutorial Series, we will now learn how to create an Apache Pig script. Apache Pig scripts are used to execute a set of Apache Pig commands collectively. This helps in reducing the time and effort invested in writing and executing each command manually while doing this in Pig programming. It is also an integral part of the Hadoop course curriculum. This blog is a step by step guide to help you create your first Apache Pig script.

Apache Pig script Execution Modes

Local Mode: In ‘local mode’, you can execute the pig script in local file system. In this case, you don’t need to store the data in Hadoop HDFS file system, instead you can work with the data stored in local file system itself.

MapReduce Mode: In ‘MapReduce mode’, the data needs to be stored in HDFS file system and you can process the data with the help of pig script.

Apache Pig Script in MapReduce Mode

Let us say our task is to read data from a data file and to display the required contents on the terminal as output.

The sample data file contains following data:

Save the text file with the name ‘information.txt’

The sample data file contains five columns FirstName, LastName, MobileNo, City, and Profession separated by tab key. Our task is to read the content of this file from the HDFS and display all the columns of these records.

To process this data using Pig, this file should be present in Apache Hadoop HDFS.

Command: hadoop fs –copyFromLocal /home/edureka/information.txt /edureka

Step 1: Writing a Pig script

Create and open an Apache Pig script file in an editor (e.g. gedit).

Command: sudo gedit /home/edureka/output.pig

This command will create a ‘output.pig’ file inside the home directory of edureka user.

Let’s write few PIG commands in output.pig file.


A = LOAD ‘/edureka/information.txt’ using PigStorage (‘	’) as (FName: chararray, LName: chararray, MobileNo: chararray, City: chararray, Profession: chararray);

B = FOREACH A generate FName, MobileNo, Profession;

DUMP B;

Save and close the file.

The first command loads the file ‘information.txt’ into variable A with indirect schema (FName, LName, MobileNo, City, Profession).
The second command loads the required data from variable A to variable B.
The third line displays the content of variable B on the terminal/console.

Step 2: Execute the Apache Pig Script

To execute the pig script in HDFS mode, run the following command:

Command: pig /home/edureka/output.pig

After the execution finishes, review the result. These below images show the results and their intermediate map and reduce functions.

Below image shows that the Script executed successfully.

Below image shows the result of our script.

Congratulations on executing your first Apache Pig script successfully!

Now you know, how to create and execute Apache Pig script. Hence, our next blog in Hadoop Tutorial Series will be covering how to create UDF (User Defined Functions) in Apache Pig and execute it in MapReduce/HDFS mode.

Now that you have created and executed Apache Pig Script, check out the Hadoop training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. The Edureka Big Data Hadoop Certification Training course helps learners become expert in HDFS, Yarn, MapReduce, Pig, Hive, HBase, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain.

Got a question for us? Please mention it in the comments section and we will get back to you.

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

Pig Programming: Create Your First Apache Pig Script

Pig Programming: Create Your First Apache Pig Script

Apache Pig script Execution Modes

Apache Pig Script in MapReduce Mode

Recommended videos for you

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Advanced Security In Hadoop Cluster

Big Data Processing With Apache Spark

Introduction to Hadoop Administration

Bulk Loading Into HBase With MapReduce

What is Big Data and Why Learn Hadoop!!!

Introduction to Big Data TDD and Pig Unit

Is It The Right Time For Me To Learn Hadoop ? Find out.

What Is Hadoop – All You Need To Know About Hadoop

Hive Tutorial – Understanding Hive In Depth

Hadoop-A Highly Available And Secure Enterprise Data Warehousing Solution

Python for Big Data Analytics

MapReduce Design Patterns – Application of Join Pattern

Big Data Tutorial – Get Started With Big Data And Hadoop

Improve Customer Service With Big Data

Boost Your Data Career with Predictive Analytics! Learn How ?

Ways to Succeed with Hadoop in 2015

5 Scenarios: When To Use & When Not to Use Hadoop

HBase Tutorial – A Complete Guide On Apache HBase

5 Things One Must Know About Spark

Recommended blogs for you

Azure Synapse vs. Databricks – What Are the Differences?

Splunk Architecture: Tutorial On Forwarder, Indexer And Search Head

Pig Tutorial: Apache Pig Architecture & Twitter Case Study

Spark SQL Tutorial – Understanding Spark SQL With Examples

Applying Hadoop with Data Science

PySpark MLlib Tutorial : Machine Learning with PySpark

Explaining Hadoop Configuration

Spark vs Hadoop: Which is the Best Big Data Framework?

What is Scala? A Complete Guide to Scala Programming

Top Hadoop Interview Questions On Apache PIG For 2025

Zookeeper Tutorial: The Guide you need to Master Zookeeper

5 Reasons to Learn Hadoop

What is CCA-175 Spark and Hadoop Developer Certification?

What is Big Data? – A Beginner’s Guide to the World of Big Data

Spark Streaming Tutorial – Sentiment Analysis Using Apache Spark

Big Data Processing with Apache Spark & Scala

Overview of HBase Storage Architecture

Azure Synapse: Unlocking the Power of Your Data

4 Practical Reasons to Learn Hadoop 2.0

Why Hadoop?

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric DP-700 Certification Trainin ...

PySpark Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Kafka Certification Training Course

Apache Spark and Scala Certification Training ...

ELK Stack Training & Certification

Splunk Certification Training: Power User and ...

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Pig Programming: Create Your First Apache Pig Script