Hadoop: How to keep duplicates in Hive using collect_set()?

0 votes

I want to keep the duplicates in hive when I use collect_set(). Example:

hash_id | num_of_cats
=====================
abcdef            5
abcdef            4
abcdef            3
fndflka            1
fndflka            2
fndflka            3
djsb33            7
djsb33            7
djsb33            7

should return:

hash_agg | cats_aggregate
===========================
abcdef   Array<int>(5,4,3)
fndflka   Array<int>(1,2,3)
djsb33   Array<int>(7,7,7)
Nov 2, 2018 in Big Data Hadoop by slayer
• 29,170 points
342 views

1 answer to this question.

0 votes
SELECT
    hash_id, COLLECT_LIST(num_of_cats) AS aggr_set
FROM
    <tablename>
WHERE
    <condition>
GROUP BY
    hash_id
;
answered Nov 2, 2018 by Omkar
• 67,660 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 56,920 points
223 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 67,660 points
1,501 views
0 votes
1 answer

Hadoop Hive Hbase: How to insert data into Hbase using Hive (JSON file)?

You can use the get_json_object function to parse the ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 67,660 points
648 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,352 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
402 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,608 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,211 views
0 votes
1 answer

Hadoop Hive: How to split string in Hive?

You can use the split function along ...READ MORE

answered Nov 6, 2018 in Big Data Hadoop by Omkar
• 67,660 points
2,699 views
0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 67,660 points
1,145 views