Hadoop: How to keep duplicates in Hive using collect_set()?

0 votes

I want to keep the duplicates in hive when I use collect_set(). Example:

hash_id | num_of_cats
=====================
abcdef            5
abcdef            4
abcdef            3
fndflka            1
fndflka            2
fndflka            3
djsb33            7
djsb33            7
djsb33            7

should return:

hash_agg | cats_aggregate
===========================
abcdef   Array<int>(5,4,3)
fndflka   Array<int>(1,2,3)
djsb33   Array<int>(7,7,7)
Nov 2, 2018 in Big Data Hadoop by slayer
• 29,040 points
134 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
SELECT
    hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
FROM
    <tablename>
WHERE
    <condition>
GROUP BY
    hash_id
;
answered Nov 2, 2018 by Omkar
• 65,850 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 56,120 points
125 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 65,850 points
101 views
0 votes
1 answer

Hadoop Hive Hbase: How to insert data into Hbase using Hive (JSON file)?

You can use the get_json_object function to parse the ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 65,850 points
242 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
1,679 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
132 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,152 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
572 views
0 votes
1 answer

Hadoop Hive: How to split string in Hive?

You can use the split function along ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 65,850 points
784 views
0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 65,850 points
370 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.