Bucketing vs Partitioning in HIve

0 votes
What is the difference between Partitioning and Bucketing in Hive?
Jul 9, 2018 in Big Data Hadoop by shams
• 3,580 points

edited Jul 9, 2018 by shams 1,638 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
Partition divides large amount of data into multiple slices based on value of a table column(s).

Assume that you are storing information of people in entire world spread across 196+ countries spanning around 500 crores of entries. If you want to query people from a particular country (Vatican city), in absence of partitioning, you have to scan all 500 crores of entries even to fetch thousand entries of a country. If you partition the table based on country, you can fine tune querying process by just checking the data for only one country partition. Hive partition creates a separate directory for a column(s) value.

Bucketing decomposes data into more manageable or equal parts.

With partitioning, there is a possibility that you can create multiple small partitions based on column values. If you go for bucketing, you are restricting number of buckets to store the data. This number is defined during table creation scripts.

Hope this helps.
answered Jul 9, 2018 by Data_Nerd
• 2,340 points

Related Questions In Big Data Hadoop

0 votes
2 answers

How are Partitioning and Bucketing different from each other in Apache Hive?

Let us consider a student database table ...READ MORE

answered Apr 15 in Big Data Hadoop by nitinrawat895
• 9,030 points
110 views
0 votes
1 answer

Bucketing in Hive

By default, only 1 bucket will be ...READ MORE

answered Feb 11 in Big Data Hadoop by Omkar
• 65,850 points
85 views
0 votes
1 answer

What is the syntax for creating bucketing table in hive?

Hi, Syntax for creating bucketed table is as ...READ MORE

answered May 16 in Big Data Hadoop by Gitika
• 8,140 points
10 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 9,030 points
174 views
0 votes
0 answers
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,663 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,066 views
0 votes
1 answer

Sqoop vs distCP

DistCP utility can be used to transfer ...READ MORE

answered May 30, 2018 in Big Data Hadoop by Data_Nerd
• 2,340 points
105 views
+5 votes
2 answers

Hive to HDFS transfer

Hive does not store data. Hive is ...READ MORE

answered Sep 20, 2018 in Big Data Hadoop by SM
358 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.