How to select partition in Hive

0 votes

Could you please explain me How to select a column for a partition?

Feb 19, 2019 in Big Data Hadoop by Karan
9,865 views

1 answer to this question.

0 votes

Follow these steps:

A. Create Database

------------------

create database retail123;

B. Select Database

------------------

use retail123;

C. Create table for storing transactional records

-------------------------------------------------

create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE,

category STRING, product STRING, city STRING, state STRING, spendby STRING)

row format delimited

fields terminated by ','

stored as textfile;

D. Load the data into the table

-------------------------------

LOAD DATA LOCAL INPATH 'txns1.txt' OVERWRITE INTO TABLE txnrecords;

E. Describing metadata or schema of the table

---------------------------------------------

describe txnrecords;

F. Counting no of records

-------------------------

select count(*) from txnrecords;

G. Counting total spending by category of products

--------------------------------------------------

select category, sum(amount) from txnrecords group by category;

H. 10 customers

--------------------

select custno, sum(amount) from txnrecords group by custno limit 10;

I. Create partitioned table

---------------------------

create table txnrecsByCat(txnno INT, txndate STRING, custno INT, amount DOUBLE,

product STRING, city STRING, state STRING, spendby STRING)

partitioned by (category STRING)

clustered by (state) INTO 10 buckets

row format delimited

fields terminated by ','

stored as textfile;

J. Configure Hive to allow partitions

-------------------------------------

However, a query across all partitions could trigger an enormous MapReduce job if the table data and number of partitions are large. A highly suggested safety measure is putting Hive into strict mode, which prohibits queries of partitioned tables without a WHERE clause that filters on partitions. You can set the mode to nonstrict, as in the following session:

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.dynamic.partition=true;

set hive.enforce.bucketing=true;

K. Load data into partition table

----------------------------------

from txnrecords txn INSERT OVERWRITE TABLE txnrecsByCat PARTITION(category)

select txn.txnno, txn.txndate,txn.custno, txn.amount,txn.product,txn.city,txn.state,

txn.spendby, txn.category DISTRIBUTE BY category;

select * from  txnrecsByCat;
answered Feb 19, 2019 by Omkar
• 69,210 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to check if a particular partition exists in Hive?

Hey, Basically, with the following query, we can ...READ MORE

answered Jun 26, 2019 in Big Data Hadoop by Gitika
• 65,910 points
9,204 views
0 votes
1 answer

How to select particular column In a table in hive?

Hive is a high-level language to analyze ...READ MORE

answered Jul 31, 2019 in Big Data Hadoop by Killian
9,570 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,900 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,630 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,221 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
105,028 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,303 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 69,210 points
2,070 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 69,210 points
8,211 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP