How to limit the number of rows per each item in a Hive QL?

0 votes

Say I have multiple items listed in a where clause.
How do I limit to N for each item in the list?

EX:

select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000
Nov 30, 2018 in Big Data Hadoop by slayer
• 29,050 points
615 views

1 answer to this question.

0 votes
SELECT a_id, b, c, count(*) as sumrequests
FROM (
    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
    FROM table_name
    ) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;

Try the above code and This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id.

answered Nov 30, 2018 by Omkar
• 67,290 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to change the location of a table in hive?

Hey, Basically When we create a table in hive, ...READ MORE

answered May 14 in Big Data Hadoop by Gitika
• 25,300 points
112 views
0 votes
1 answer

How to see the content of a table in hive?

Hello, If you want to see the content ...READ MORE

answered May 14 in Big Data Hadoop by Gitika
• 25,300 points
59 views
0 votes
1 answer

How to Modify the Maximum Number of Versions for a Column Family in Hbase?

Hey, The example uses HBase Shell to keep ...READ MORE

answered May 31 in Big Data Hadoop by Gitika
• 25,300 points
29 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
2,392 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
244 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,191 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
896 views
+1 vote
1 answer

How to count number of rows in alias in PIG?

COUNT is part of pig LOGS= LOAD 'log'; LOGS_GROUP= ...READ MORE

answered Oct 15, 2018 in Big Data Hadoop by Omkar
• 67,290 points
64 views
0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 67,290 points
706 views