How to select partition in Hive?

0 votes

Could you please explain me How to select a column for a partition?

Feb 18 in Big Data Hadoop by Karan
59 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Follow these steps:

A. Create Database

------------------

create database retail123;

B. Select Database

------------------

use retail123;

C. Create table for storing transactional records

-------------------------------------------------

create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE,

category STRING, product STRING, city STRING, state STRING, spendby STRING)

row format delimited

fields terminated by ','

stored as textfile;

D. Load the data into the table

-------------------------------

LOAD DATA LOCAL INPATH 'txns1.txt' OVERWRITE INTO TABLE txnrecords;

E. Describing metadata or schema of the table

---------------------------------------------

describe txnrecords;

F. Counting no of records

-------------------------

select count(*) from txnrecords;

G. Counting total spending by category of products

--------------------------------------------------

select category, sum(amount) from txnrecords group by category;

H. 10 customers

--------------------

select custno, sum(amount) from txnrecords group by custno limit 10;

I. Create partitioned table

---------------------------

create table txnrecsByCat(txnno INT, txndate STRING, custno INT, amount DOUBLE,

product STRING, city STRING, state STRING, spendby STRING)

partitioned by (category STRING)

clustered by (state) INTO 10 buckets

row format delimited

fields terminated by ','

stored as textfile;

J. Configure Hive to allow partitions

-------------------------------------

However, a query across all partitions could trigger an enormous MapReduce job if the table data and number of partitions are large. A highly suggested safety measure is putting Hive into strict mode, which prohibits queries of partitioned tables without a WHERE clause that filters on partitions. You can set the mode to nonstrict, as in the following session:

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.dynamic.partition=true;

set hive.enforce.bucketing=true;

K. Load data into partition table

----------------------------------

from txnrecords txn INSERT OVERWRITE TABLE txnrecsByCat PARTITION(category)

select txn.txnno, txn.txndate,txn.custno, txn.amount,txn.product,txn.city,txn.state,

txn.spendby, txn.category DISTRIBUTE BY category;

select * from  txnrecsByCat;
answered Feb 18 by Omkar
• 65,850 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 9,030 points
173 views
0 votes
1 answer

What is Custom partitioner in Hadoop? How to write partition function ?

Don't think that in Hadoop the same ...READ MORE

answered Sep 18, 2018 in Big Data Hadoop by Frankie
• 9,570 points
78 views
0 votes
1 answer

How to create smaller table from big table in HIVE?

You could probably best use Hive's built-in sampling ...READ MORE

answered Sep 24, 2018 in Big Data Hadoop by digger
• 27,620 points
67 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,657 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,041 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
560 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 65,850 points
133 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 65,850 points
324 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.