Why Apache Pig is used instead of Hadoop?

0 votes
I have understand that Pig works on top of Apache Hadoop and it uses its own language called as Pig Latin. But, I am confused about why Pig what developed in the first place. In other words, I would like to know what are those features that Pig provides which was not provided by Apache Hadoop? Why should I go ahead with an overhead of learning a new language - Pig Latin?
May 7, 2018 in Big Data Hadoop by Meci Matt
• 9,420 points
192 views

1 answer to this question.

0 votes
As you know writing mapreduce programs in Java or any other language is quite complex. You may have to write a 100 lines of java code for doing a simple sort. Also, in a company, you have many people (analyst) who are quite comfortable with writing SQL like queries and therefore, wanted similar kind of functionalities out of the box from Hadoop. Now, following are the features that Pig provides:
  • Pig Latin is a high-level data flow language, whereas MapReduce is a low-level data processing paradigm.
  • Without writing complex Java implementations in MapReduce, programmers can achieve the same implementations very easily using Pig Latin.
  • Apache Pig uses multi-query approach (i.e. using a single query of Pig Latin we can accomplish multiple MapReduce tasks), which reduces the length of the code by 20 times. Hence, this reduces the development period by almost 16 times.
  • Pig provides many built-in operators to support data operations like joins, filters, ordering, sorting etc. Whereas to perform the same function in MapReduce is a humongous task.
  • In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce. I will explain you these data types in a while.
Basically, Pig provides an abstraction to avoid the complexity of writing MapReduce programming, providing various query operations like joins, group by, etc. out of the box. This makes life of a data engineer easier for managing and performing different ad hoc queries on the data.

 

Pig is a high-level platform for creating MapReduce programs used with Hadoop.The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python or JavaScript and then call directly from the language.

Now, the keywords above are high-level and abstracts . The way we have DBAs who can create/manage databases without knowledge of any major programming language but for SQL, similarly we can have data-engineers creating/managing data-pipelines/warehouses using Pig without getting into the complexities of how/what is being implemented/executed as hadoop jobs. So, to answer your question, Pig is not there to complement Hadoop in any of the features it is lacking, but it is just a high-level framework built on top hadoop to do things faster(development time).

You can certainly do everything what Pig does with Hadoop, but try out some advanced features of Pig and writing hadoop jobs for them will take some real good time. So, speaking very liberally, some of the tasks which are generic/common throughout data engineering have been implemented in bare hadoop before-hand in form of Pig, you just need to tell it in Pig-Latin to be executed.

answered May 7, 2018 by Ashish
• 2,630 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Why Java Code in Hadoop uses own Data Types instead of basic Data types?

Hadoop provides us Writable interface based data ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
71 views
0 votes
1 answer

Why is jar file required to execute a MR code instead of class file?

We use hadoop keyword to invoke the ...READ MORE

answered Apr 24, 2018 in Big Data Hadoop by Shubham
• 13,290 points
47 views
0 votes
1 answer

Why tuple keywords is used in pig?

Hey, A tuple is a set of field, ...READ MORE

answered May 6 in Big Data Hadoop by Gitika
• 25,300 points
15 views
0 votes
1 answer

Why JOIN operator is used in pig?

Hey, In Apache pig, JOIN operator is used ...READ MORE

answered May 6 in Big Data Hadoop by Gitika
• 25,300 points
19 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
2,400 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,219 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
896 views
0 votes
1 answer
0 votes
1 answer

What Distributed Cache is actually used for in Hadoop?

Basically distributed cache allows you to cache ...READ MORE

answered Apr 2, 2018 in Big Data Hadoop by Ashish
• 2,630 points
130 views
0 votes
1 answer

What is the use of sequence file in Hadoop?

Sequence files are binary files containing serialized ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by Ashish
• 2,630 points
919 views