Hadoop: How to keep duplicates in Hive using collect_set()?

0 votes

I want to keep the duplicates in hive when I use collect_set(). Example:

hash_id | num_of_cats
=====================
abcdef            5
abcdef            4
abcdef            3
fndflka            1
fndflka            2
fndflka            3
djsb33            7
djsb33            7
djsb33            7

should return:

hash_agg | cats_aggregate
===========================
abcdef   Array<int>(5,4,3)
fndflka   Array<int>(1,2,3)
djsb33   Array<int>(7,7,7)
Nov 2, 2018 in Big Data Hadoop by slayer
• 29,050 points
186 views

1 answer to this question.

0 votes
SELECT
    hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
FROM
    <tablename>
WHERE
    <condition>
GROUP BY
    hash_id
;
answered Nov 2, 2018 by Omkar
• 67,120 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 56,160 points
145 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 67,120 points
214 views
0 votes
1 answer

Hadoop Hive Hbase: How to insert data into Hbase using Hive (JSON file)?

You can use the get_json_object function to parse the ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 67,120 points
404 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,030 points
2,032 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,030 points
190 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,294 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
754 views
0 votes
1 answer

Hadoop Hive: How to split string in Hive?

You can use the split function along ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 67,120 points
1,241 views
0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 67,120 points
532 views