How to select partition in Hive?

0 votes

Could you please explain me How to select a column for a partition?

Feb 18 in Big Data Hadoop by Karan
321 views

1 answer to this question.

0 votes

Follow these steps:

A. Create Database

------------------

create database retail123;

B. Select Database

------------------

use retail123;

C. Create table for storing transactional records

-------------------------------------------------

create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE,

category STRING, product STRING, city STRING, state STRING, spendby STRING)

row format delimited

fields terminated by ','

stored as textfile;

D. Load the data into the table

-------------------------------

LOAD DATA LOCAL INPATH 'txns1.txt' OVERWRITE INTO TABLE txnrecords;

E. Describing metadata or schema of the table

---------------------------------------------

describe txnrecords;

F. Counting no of records

-------------------------

select count(*) from txnrecords;

G. Counting total spending by category of products

--------------------------------------------------

select category, sum(amount) from txnrecords group by category;

H. 10 customers

--------------------

select custno, sum(amount) from txnrecords group by custno limit 10;

I. Create partitioned table

---------------------------

create table txnrecsByCat(txnno INT, txndate STRING, custno INT, amount DOUBLE,

product STRING, city STRING, state STRING, spendby STRING)

partitioned by (category STRING)

clustered by (state) INTO 10 buckets

row format delimited

fields terminated by ','

stored as textfile;

J. Configure Hive to allow partitions

-------------------------------------

However, a query across all partitions could trigger an enormous MapReduce job if the table data and number of partitions are large. A highly suggested safety measure is putting Hive into strict mode, which prohibits queries of partitioned tables without a WHERE clause that filters on partitions. You can set the mode to nonstrict, as in the following session:

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.dynamic.partition=true;

set hive.enforce.bucketing=true;

K. Load data into partition table

----------------------------------

from txnrecords txn INSERT OVERWRITE TABLE txnrecsByCat PARTITION(category)

select txn.txnno, txn.txndate,txn.custno, txn.amount,txn.product,txn.city,txn.state,

txn.spendby, txn.category DISTRIBUTE BY category;

select * from  txnrecsByCat;
answered Feb 18 by Omkar
• 67,380 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to check if a particular partition exists in Hive?

Hey, Basically, with the following query, we can ...READ MORE

answered Jun 26 in Big Data Hadoop by Gitika
• 25,340 points
156 views
0 votes
1 answer

How to select particular column In a table in hive?

Hive is a high-level language to analyze ...READ MORE

answered Jul 31 in Big Data Hadoop by Killian
18 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
284 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,641 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
275 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,200 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
972 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 67,380 points
243 views
0 votes
1 answer

How to save Spark dataframe as dynamic partitioned table in Hive?

Hey, you can try something like this: df.write.partitionBy('year', ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 67,380 points
1,145 views