Bucketing vs Partitioning in HIve

0 votes
What is the difference between Partitioning and Bucketing in Hive?
Jul 9, 2018 in Big Data Hadoop by shams
• 3,580 points

edited Jul 9, 2018 by shams 6,070 views

1 answer to this question.

0 votes
Partition divides large amount of data into multiple slices based on value of a table column(s).

Assume that you are storing information of people in entire world spread across 196+ countries spanning around 500 crores of entries. If you want to query people from a particular country (Vatican city), in absence of partitioning, you have to scan all 500 crores of entries even to fetch thousand entries of a country. If you partition the table based on country, you can fine tune querying process by just checking the data for only one country partition. Hive partition creates a separate directory for a column(s) value.

Bucketing decomposes data into more manageable or equal parts.

With partitioning, there is a possibility that you can create multiple small partitions based on column values. If you go for bucketing, you are restricting number of buckets to store the data. This number is defined during table creation scripts.

Hope this helps.
answered Jul 9, 2018 by Data_Nerd
• 2,370 points

Related Questions In Big Data Hadoop

0 votes
2 answers

How are Partitioning and Bucketing different from each other in Apache Hive?

Let us consider a student database table ...READ MORE

answered Apr 15, 2019 in Big Data Hadoop by nitinrawat895
• 10,870 points
863 views
0 votes
1 answer

Bucketing in Hive

By default, only 1 bucket will be ...READ MORE

answered Feb 11, 2019 in Big Data Hadoop by Omkar
• 69,000 points
328 views
0 votes
1 answer

What is the syntax for creating bucketing table in hive?

Hi, Syntax for creating bucketed table is as ...READ MORE

answered May 16, 2019 in Big Data Hadoop by Gitika
• 26,430 points
155 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,870 points
512 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
4,581 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
652 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
26,008 views
0 votes
1 answer

Sqoop vs distCP

DistCP utility can be used to transfer ...READ MORE

answered May 30, 2018 in Big Data Hadoop by Data_Nerd
• 2,370 points
253 views
+5 votes
2 answers

Hive to HDFS transfer

Hive does not store data. Hive is ...READ MORE

answered Sep 20, 2018 in Big Data Hadoop by SM
1,184 views