How to limit the number of rows per each item in a Hive QL?

0 votes

Say I have multiple items listed in a where clause.
How do I limit to N for each item in the list?

EX:

select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000
Nov 30, 2018 in Big Data Hadoop by slayer
• 29,170 points
1,320 views

1 answer to this question.

0 votes
SELECT a_id, b, c, count(*) as sumrequests
FROM (
    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
    FROM table_name
    ) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;

Try the above code and This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id.

answered Nov 30, 2018 by Omkar
• 67,620 points

One doubt here for the subquery if you can answer

SELECT a_id, b, c, row_number() over (Partition BY a_id) as row FROM table_name

This inner query will be executed for all the rows for each request.
For example if user needs data from 50th row  for one request, next user need to see from 100 th row (concept of pagination) so inner query will be executed for each request. Can that be a performance issue.

Yes @Debapriya, this query will be executed for each request and it may cause performance issue but in such cases we need to choose between time and space. If you want to decrease space complexity(if this query needs to be executed frequently), one way to do this it is by creating another sub-table of the result and then get result from that data. But this will occupy space.

Related Questions In Big Data Hadoop

0 votes
1 answer

How to change the location of a table in hive?

Hey, Basically When we create a table in hive, ...READ MORE

answered May 14 in Big Data Hadoop by Gitika
• 25,340 points
174 views
0 votes
1 answer

How to see the content of a table in hive?

Hello, If you want to see the content ...READ MORE

answered May 14 in Big Data Hadoop by Gitika
• 25,340 points
94 views
0 votes
1 answer

How to Modify the Maximum Number of Versions for a Column Family in Hbase?

Hey, The example uses HBase Shell to keep ...READ MORE

answered May 31 in Big Data Hadoop by Gitika
• 25,340 points
47 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,981 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
332 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
14,739 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
1,095 views
+1 vote
1 answer

How to count number of rows in alias in PIG?

COUNT is part of pig LOGS= LOAD 'log'; LOGS_GROUP= ...READ MORE

answered Oct 15, 2018 in Big Data Hadoop by Omkar
• 67,620 points
105 views
0 votes
1 answer

Hadoop Hive: How to skip the first line of csv while loading in hive table?

You can try this: CREATE TABLE temp ...READ MORE

answered Nov 8, 2018 in Big Data Hadoop by Omkar
• 67,620 points
988 views