Bucketing vs Partitioning in HIve

0 votes
What is the difference between Partitioning and Bucketing in Hive?
Jul 9, 2018 in Big Data Hadoop by shams
• 3,580 points

edited Jul 9, 2018 by shams 2,728 views

1 answer to this question.

0 votes
Partition divides large amount of data into multiple slices based on value of a table column(s).

Assume that you are storing information of people in entire world spread across 196+ countries spanning around 500 crores of entries. If you want to query people from a particular country (Vatican city), in absence of partitioning, you have to scan all 500 crores of entries even to fetch thousand entries of a country. If you partition the table based on country, you can fine tune querying process by just checking the data for only one country partition. Hive partition creates a separate directory for a column(s) value.

Bucketing decomposes data into more manageable or equal parts.

With partitioning, there is a possibility that you can create multiple small partitions based on column values. If you go for bucketing, you are restricting number of buckets to store the data. This number is defined during table creation scripts.

Hope this helps.
answered Jul 9, 2018 by Data_Nerd
• 2,360 points

Related Questions In Big Data Hadoop

0 votes
2 answers

How are Partitioning and Bucketing different from each other in Apache Hive?

Let us consider a student database table ...READ MORE

answered Apr 15 in Big Data Hadoop by nitinrawat895
• 10,670 points
369 views
0 votes
1 answer

Bucketing in Hive

By default, only 1 bucket will be ...READ MORE

answered Feb 11 in Big Data Hadoop by Omkar
• 67,520 points
152 views
0 votes
1 answer

What is the syntax for creating bucketing table in hive?

Hi, Syntax for creating bucketed table is as ...READ MORE

answered May 16 in Big Data Hadoop by Gitika
• 25,340 points
57 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
295 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,760 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
290 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,652 views
0 votes
1 answer

Sqoop vs distCP

DistCP utility can be used to transfer ...READ MORE

answered May 30, 2018 in Big Data Hadoop by Data_Nerd
• 2,360 points
155 views
+5 votes
2 answers

Hive to HDFS transfer

Hive does not store data. Hive is ...READ MORE

answered Sep 20, 2018 in Big Data Hadoop by SM
633 views