How Impala is fast compared to Hive in terms of query response?

0 votes
I am querying large CSV data sets present in HDFS using Hive and Impala. I saw that I’m getting better response time with Impala compared to Hive for the queries.

Can anyone tell me some use cases where impala is best suited and where hive is best suited?

How impala is fast in terms of query response when compared to hive?
Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,010 points
200 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Impala provides faster response as it uses MPP(massively parallel processing) unlike Hive which uses MapReduce under the hood, which involves some initial overheads (as Charles sir has specified). Massively parallel processing is a type of computing that uses many separate CPUs running in parallel to execute a single program where each CPU has it's own dedicated memory. The very fact that Impala, being MPP based, doesn't involve the overheads of a MapReduce jobs viz. job setup and creation, slot assignment, split creation, map generation etc., makes it blazingly fast.

But that doesn't mean that Impala is the solution to all your problems. Being highly memory intensive (MPP), it is not a good fit for tasks that require heavy data operations like joins etc., as you just can't fit everything into the memory. This is where Hive is a better fit.

So, if you need real time, ad-hoc queries over a subset of your data go for Impala. And if you have batch processing kinda needs over your Big Data go for Hive.

answered Mar 21, 2018 by nitinrawat895
• 9,490 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 66,910 points
448 views
0 votes
1 answer

How to limit the number of rows per each item in a Hive QL?

SELECT a_id, b, c, count(*) as sumrequests FROM ...READ MORE

answered Nov 30, 2018 in Big Data Hadoop by Omkar
• 66,910 points
216 views
0 votes
1 answer

How to change the location of a table in hive?

Hey, Basically When we create a table in hive, ...READ MORE

answered May 14 in Big Data Hadoop by Gitika
• 14,910 points
53 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,490 points
1,845 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,490 points
160 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,159 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
657 views
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
3,605 views
0 votes
12 answers

What is Zookeeper? What is the purpose of Zookeeper in Hadoop Ecosystem?

Hey, Apache Zookeeper says that it is a ...READ MORE

answered Apr 29 in Big Data Hadoop by Gitika
• 14,910 points
2,768 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.