Hadoop Hive partitioning

0 votes
When and Why do we use Hive Partitioning?
Dec 13, 2018 in Big Data Hadoop by digger
• 27,620 points
55 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Partitioning:

Hive has been one of the preferred tools for performing queries on large datasets, especially when the full table scan is done on the datasets.

In the case of tables which are not partitioned, all the files in a table’s data directory are read and then filters are applied to it as a subsequent phase. This becomes a slow and expensive affair especially in cases of large tables.

Without partitioning, Hive reads all the data in the directory and applies the query filters on it. This is slow and expensive since all data has to be read.

Very often users need to filter the data on specific column values. To apply the partitioning in hive, users need to understand the domain of the data on which they are doing the analysis.

With this knowledge, identification of the frequently queried or accessed columns becomes easy and then partitioning feature of Hive can be applied on the selected columns.

Owing to the fact that Partitions are horizontal slices of data, larger sets of data can be separated into more manageable chunks.

When to use hive partitioning:

When any user wants data contained within a table to be split across multiple sections in hive table, use of partition is suggested.

The entries for the various columns of the dataset are segregated and stored in their respective partition. When we write the query to fetch the values from the table, only the required partitions of the table are queried, which reduces the time taken by the query to yield the result.

answered Dec 13, 2018 by Omkar
• 65,820 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Setting Hive/Hadoop property using Hive Query

You can set Hadoop & Hive conf ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 12,110 points
80 views
0 votes
1 answer

Installing Hive & Hadoop in VM

For Hadoop installation, I would recommend you ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 12,110 points
707 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,639 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
7,954 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
552 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 65,820 points
132 views
0 votes
1 answer

Hadoop Hive: How to split string in Hive?

You can use the split function along ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 65,820 points
726 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.