As you know writing mapreduce programs in Java or any other language is quite complex. You may have to write a 100 lines of java code for doing a simple sort. Also, in a company, you have many people (analyst) who are quite comfortable with writing SQL like queries and therefore, wanted similar kind of functionalities out of the box from Hadoop. Now, following are the features that Pig provides:
- Pig Latin is a high-level data flow language, whereas MapReduce is a low-level data processing paradigm.
- Without writing complex Java implementations in MapReduce, programmers can achieve the same implementations very easily using Pig Latin.
- Apache Pig uses multi-query approach (i.e. using a single query of Pig Latin we can accomplish multiple MapReduce tasks), which reduces the length of the code by 20 times. Hence, this reduces the development period by almost 16 times.
- Pig provides many built-in operators to support data operations like joins, filters, ordering, sorting etc. Whereas to perform the same function in MapReduce is a humongous task.
- In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce. I will explain you these data types in a while.
Basically, Pig provides an abstraction to avoid the complexity of writing MapReduce programming, providing various query operations like joins, group by, etc. out of the box. This makes life of a data engineer easier for managing and performing different ad hoc queries on the data.
Now, the keywords above are high-level and abstracts . The way we have DBAs who can create/manage databases without knowledge of any major programming language but for SQL, similarly we can have data-engineers creating/managing data-pipelines/warehouses using Pig without getting into the complexities of how/what is being implemented/executed as hadoop jobs. So, to answer your question, Pig is not there to complement Hadoop in any of the features it is lacking, but it is just a high-level framework built on top hadoop to do things faster(development time).
You can certainly do everything what Pig does with Hadoop, but try out some advanced features of Pig and writing hadoop jobs for them will take some real good time. So, speaking very liberally, some of the tasks which are generic/common throughout data engineering have been implemented in bare hadoop before-hand in form of Pig, you just need to tell it in Pig-Latin to be executed.