Why Apache Pig is used instead of Hadoop?

Question

I have understand that Pig works on top of Apache Hadoop and it uses its own language called as Pig Latin. But, I am confused about why Pig what developed in the first place. In other words, I would like to know what are those features that Pig provides which was not provided by Apache Hadoop? Why should I go ahead with an overhead of learning a new language - Pig Latin?

Ashish · Answer

As you know writing mapreduce programs in Java or any other language is quite complex. You may have to write a 100 lines of java code for doing a simple sort. Also, in a company, you have many people (analyst) who are quite comfortable with writing SQL like queries and therefore, wanted similar kind of functionalities out of the box from Hadoop. Now, following are the features that Pig provides:Pig Latin is a high-level data flow language, whereas MapReduce is a low-level data processing paradigm.Without writing complex Java implementations in MapReduce, programmers can achieve the same implementations&#160;very easily using Pig Latin.Apache Pig uses multi-query approach (i.e. using a single query of Pig Latin we can accomplish multiple MapReduce tasks), which reduces the length of the code by 20 times. Hence, this reduces the development period by almost 16 times.Pig provides many built-in operators to support data operations like joins, filters, ordering, sorting etc. Whereas to perform the same function in MapReduce is a&#160;humongous task.In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce. I will explain you these data types in a while.Basically, Pig provides an abstraction to avoid the complexity of writing MapReduce programming, providing various query operations like joins, group by, etc. out of the box. This makes life of a data engineer easier for managing and performing different ad hoc queries on the data.&#160;Pig is a high-level platform for creating MapReduce programs used with Hadoop.The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python or JavaScript and then call directly from the language.Now, the keywords above are high-level and abstracts . The way we have DBAs who can create/manage databases without knowledge of any major programming language but for SQL, similarly we can have data-engineers creating/managing data-pipelines/warehouses using Pig without getting into the complexities of how/what is being implemented/executed as hadoop jobs. So, to answer your question, Pig is not there to complement Hadoop in any of the features it is lacking, but it is just a high-level framework built on top hadoop to do things faster(development time).You can certainly do everything what Pig does with Hadoop, but try out some advanced features of Pig and writing hadoop jobs for them will take some real good time. So, speaking very liberally, some of the tasks which are generic/common throughout data engineering have been implemented in bare hadoop before-hand in form of Pig, you just need to tell it in Pig-Latin to be executed.

Why Apache Pig is used instead of Hadoop

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Why Java Code in Hadoop uses own Data Types instead of basic Data types?

Why is jar file required to execute a MR code instead of class file?

Why tuple keywords is used in pig?

Why JOIN operator is used in pig?

Hadoop Mapreduce word count Program

hadoop fs -put command?

Hadoop dfs -ls command?

Is there a way to copy data from one one Hadoop distributed file system(HDFS) to another HDFS?

What Distributed Cache is actually used for in Hadoop?

What is the use of sequence file in Hadoop?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES