How Impala is fast compared to Hive in terms of query response?

0 votes
I am querying large CSV data sets present in HDFS using Hive and Impala. I saw that I’m getting better response time with Impala compared to Hive for the queries.

Can anyone tell me some use cases where impala is best suited and where hive is best suited?

How impala is fast in terms of query response when compared to hive?
Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,050 points

1 answer to this question.

0 votes

Impala provides faster response as it uses MPP(massively parallel processing) unlike Hive which uses MapReduce under the hood, which involves some initial overheads (as Charles sir has specified). Massively parallel processing is a type of computing that uses many separate CPUs running in parallel to execute a single program where each CPU has it's own dedicated memory. The very fact that Impala, being MPP based, doesn't involve the overheads of a MapReduce jobs viz. job setup and creation, slot assignment, split creation, map generation etc., makes it blazingly fast.

But that doesn't mean that Impala is the solution to all your problems. Being highly memory intensive (MPP), it is not a good fit for tasks that require heavy data operations like joins etc., as you just can't fit everything into the memory. This is where Hive is a better fit.

So, if you need real time, ad-hoc queries over a subset of your data go for Impala. And if you have batch processing kinda needs over your Big Data go for Hive.

answered Mar 21, 2018 by nitinrawat895
• 10,950 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 69,030 points
+1 vote
1 answer

How to limit the number of rows per each item in a Hive QL?

SELECT a_id, b, c, count(*) as sumrequests FROM ...READ MORE

answered Nov 30, 2018 in Big Data Hadoop by Omkar
• 69,030 points
0 votes
2 answers

How to change the location of a table in hive?

Changing location requires 2 steps: 1.) Change location ...READ MORE

answered Feb 12 in Big Data Hadoop by Saksham Sehrawet
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,320 points
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
0 votes
12 answers

What is Zookeeper? What is the purpose of Zookeeper in Hadoop Ecosystem?

Hey, Apache Zookeeper says that it is a ...READ MORE

answered Apr 29, 2019 in Big Data Hadoop by Gitika
• 41,720 points