Bucketing vs Partitioning in HIve

0 votes
What is the difference between Partitioning and Bucketing in Hive?
Jul 9, 2018 in Big Data Hadoop by shams
• 3,670 points

edited Jul 9, 2018 by shams 27,106 views

1 answer to this question.

0 votes
Partition divides large amount of data into multiple slices based on value of a table column(s).

Assume that you are storing information of people in entire world spread across 196+ countries spanning around 500 crores of entries. If you want to query people from a particular country (Vatican city), in absence of partitioning, you have to scan all 500 crores of entries even to fetch thousand entries of a country. If you partition the table based on country, you can fine tune querying process by just checking the data for only one country partition. Hive partition creates a separate directory for a column(s) value.

Bucketing decomposes data into more manageable or equal parts.

With partitioning, there is a possibility that you can create multiple small partitions based on column values. If you go for bucketing, you are restricting number of buckets to store the data. This number is defined during table creation scripts.

Hope this helps.
answered Jul 9, 2018 by Data_Nerd
• 2,390 points

Related Questions In Big Data Hadoop

0 votes
2 answers

How are Partitioning and Bucketing different from each other in Apache Hive?

Let us consider a student database table ...READ MORE

answered Apr 15, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points
4,127 views
0 votes
0 answers

What is the difference between partitioning and bucketing a table in Hive ?

Hi Team, I am new to Hive. I ...READ MORE

Dec 20, 2020 in Big Data Hadoop by akhtar
• 38,260 points
1,551 views
0 votes
1 answer

Bucketing in Hive

By default, only 1 bucket will be ...READ MORE

answered Feb 11, 2019 in Big Data Hadoop by Omkar
• 69,220 points
1,850 views
0 votes
1 answer

What is the syntax for creating bucketing table in hive?

Hi, Syntax for creating bucketed table is as ...READ MORE

answered May 16, 2019 in Big Data Hadoop by Gitika
• 65,770 points
1,539 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,072 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,571 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,060 views
0 votes
1 answer

Sqoop vs distCP

DistCP utility can be used to transfer ...READ MORE

answered May 30, 2018 in Big Data Hadoop by Data_Nerd
• 2,390 points
1,340 views
+5 votes
2 answers

Hive to HDFS transfer

Hive does not store data. Hive is ...READ MORE

answered Sep 20, 2018 in Big Data Hadoop by SM
3,984 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP