Difference between Pig and Hive Why have both closed

0 votes

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS.

I understand that-

  • Pig's language Pig Latin is a shift from(suits the way programmers think) SQL like declarative style of programming and Hive's query language closely resembles SQL.

  • Pig sits on top of Hadoop and in principle can also sit on top of Dryad. I might be wrong but Hive is closely coupled to Hadoop.

  • Both Pig Latin and Hive commands compiles to Map and Reduce jobs.

My question - What is the goal of having both when one (say Pig) could serve the purpose. Is it just because Pig is evangelized by Yahoo! and Hive by Facebook?

Nov 20, 2020 in Big Data Hadoop by Roshni
• 10,520 points

1 answer to this question.

0 votes

The hive was designed to appeal to a community comfortable with SQL. Its philosophy was that we don't need yet another scripting language. Hive supports map and reduces transform scripts in the language of the user's choice (which can be embedded within SQL clauses). It is widely used in Facebook by analysts comfortable with SQL as well as by data miners programming in Python. SQL compatibility efforts in Pig have been abandoned AFAIK - so the difference between the two projects is very clear.

Supporting SQL syntax also means that it's possible to integrate with existing BI tools like Microstrategy. Hive has an ODBC/JDBC driver (that's a work in progress) that should allow this to happen in the near future. It's also beginning to add support for indexes which should allow support for drill-down queries common in such environments.

Finally--this is not pertinent to the question directly--Hive is a framework for performing analytic queries. While its dominant use is to query flat files, there's no reason why it cannot query other stores. Currently, Hive can be used to query data stored in Hbase (which is a key-value store like those found in the guts of most RDBMSes), and the HadoopDB project has used Hive to query a federated RDBMS tier

answered Nov 20, 2020 by Gitika
• 65,910 points

Related Questions In Big Data Hadoop

0 votes
1 answer
0 votes
1 answer

What is the difference between Hive internal tables and external tables?

Hive has a relational database on the ...READ MORE

answered Nov 19, 2018 in Big Data Hadoop by Neha
• 6,300 points
0 votes
1 answer

Difference between hive.exec.compress.output=true; and mapreduce.output.fileoutputformat.compress=true;

Hey there! The definition of these two properties ...READ MORE

answered Dec 28, 2018 in Big Data Hadoop by Omkar
• 69,230 points
0 votes
1 answer

Pig: Difference between inner bag and outer bag

Outer Bag: An outer bag is nothing but ...READ MORE

answered Jul 16, 2019 in Big Data Hadoop by Firoz
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
0 votes
1 answer
0 votes
1 answer

What is difference between pig and Mapreduce?

Hey, In MapReduce need to writ entire logic ...READ MORE

answered May 3, 2019 in Big Data Hadoop by Gitika
• 65,910 points
0 votes
1 answer

what is the difference between CREATE TABLE and CREATE EXTERNAL TABLE in Hive?

Hey, Although, we can create two types of ...READ MORE

answered Jun 26, 2019 in Big Data Hadoop by Gitika
• 65,910 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP