Hadoop Hive partitioning

0 votes
When and Why do we use Hive Partitioning?
Dec 13, 2018 in Big Data Hadoop by digger
• 26,660 points
190 views

1 answer to this question.

0 votes

Partitioning:

Hive has been one of the preferred tools for performing queries on large datasets, especially when the full table scan is done on the datasets.

In the case of tables which are not partitioned, all the files in a table’s data directory are read and then filters are applied to it as a subsequent phase. This becomes a slow and expensive affair especially in cases of large tables.

Without partitioning, Hive reads all the data in the directory and applies the query filters on it. This is slow and expensive since all data has to be read.

Very often users need to filter the data on specific column values. To apply the partitioning in hive, users need to understand the domain of the data on which they are doing the analysis.

With this knowledge, identification of the frequently queried or accessed columns becomes easy and then partitioning feature of Hive can be applied on the selected columns.

Owing to the fact that Partitions are horizontal slices of data, larger sets of data can be separated into more manageable chunks.

When to use hive partitioning:

When any user wants data contained within a table to be split across multiple sections in hive table, use of partition is suggested.

The entries for the various columns of the dataset are segregated and stored in their respective partition. When we write the query to fetch the values from the table, only the required partitions of the table are queried, which reduces the time taken by the query to yield the result.

answered Dec 13, 2018 by Omkar
• 68,840 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Setting Hive/Hadoop property using Hive Query

You can set Hadoop & Hive conf ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,370 points
378 views
0 votes
1 answer

Installing Hive & Hadoop in VM

For Hadoop installation, I would recommend you ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,370 points
1,231 views
0 votes
1 answer
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
3,855 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
526 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
20,382 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,290 points
1,442 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 68,840 points
439 views
0 votes
1 answer

Hadoop Hive: How to split string in Hive?

You can use the split function along ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 68,840 points
3,549 views