Hadoop: How to Group mongodb - mapReduce output?

0 votes

Ihave a result of key value pair from mapReduce function , now i want to run the query on this output of mapReduce.

So i am using mapReduce to find out the stats of user like this

db.order.mapReduce(function() { emit (this.customer,{count:1,orderDate:this.orderDate.interval_start}) },
function(key,values){ 
    var sum =0 ; var lastOrderDate;  
    values.forEach(function(value) {
     if(value['orderDate']){ 
        lastOrderDate=value['orderDate'];
    }  
    sum+=value['count'];
}); 
    return {count:sum,lastOrderDate:lastOrderDate}; 
},
{ query:{status:"DELIVERED"},out:"order_total"}).find()

which give me output like this

{ "_id" : ObjectId("5443765ae4b05294c8944d5b"), "value" : { "count" : 1, "orderDate" : ISODate("2014-10-18T18:30:00Z") } }
{ "_id" : ObjectId("54561911e4b07a0a501276af"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2015-03-14T18:30:00Z") } }
{ "_id" : ObjectId("54561b9ce4b07a0a501276b1"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-01T18:30:00Z") } }
{ "_id" : ObjectId("5458712ee4b07a0a501276c2"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2014-11-03T18:30:00Z") } }
{ "_id" : ObjectId("545f64e7e4b07a0a501276db"), "value" : { "count" : 15, "lastOrderDate" : ISODate("2015-06-04T18:30:00Z") } }
{ "_id" : ObjectId("54690771e4b0070527c657ed"), "value" : { "count" : 6, "lastOrderDate" : ISODate("2015-06-03T18:30:00Z") } }
{ "_id" : ObjectId("54696c64e4b07f3c07010b4a"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-18T18:30:00Z") } }
{ "_id" : ObjectId("546980d1e4b07f3c07010b4d"), "value" : { "count" : 4, "lastOrderDate" : ISODate("2015-03-24T18:30:00Z") } }
{ "_id" : ObjectId("54699ac4e4b07f3c07010b51"), "value" : { "count" : 30, "lastOrderDate" : ISODate("2015-05-23T18:30:00Z") } }
{ "_id" : ObjectId("54699d0be4b07f3c07010b55"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-16T18:30:00Z") } }
{ "_id" : ObjectId("5469a1dce4b07f3c07010b59"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2015-04-29T18:30:00Z") } }
{ "_id" : ObjectId("5469a96ce4b07f3c07010b5e"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-16T18:30:00Z") } }
{ "_id" : ObjectId("5469c1ece4b07f3c07010b64"), "value" : { "count" : 9, "lastOrderDate" : ISODate("2015-04-15T18:30:00Z") } }
{ "_id" : ObjectId("5469f422e4b0ce7d5ee021ad"), "value" : { "count" : 5, "lastOrderDate" : ISODate("2015-06-01T18:30:00Z") } }
......

Now i want to run query and group the users on the basis of count in different categories like for user with count less than 5 in one group , 5-10 in another, etc

and want output something like this

{userLessThan5: 9 }
{user5to10: 2 }
{user10to15: 1 }
{user15to20: 0 }
  ....

How can I do it?

Oct 31, 2018 in Big Data Hadoop by digger
• 27,630 points
73 views

1 answer to this question.

0 votes
db.order.mapReduce(function() { emit (this.customer,{count:1,orderDate:this.orderDate.interval_start}) },
function(key,values){ 
var category; // add this new field
var sum =0 ; var lastOrderDate;  
values.forEach(function(value) {
 if(value['orderDate']){ 
    lastOrderDate=value['orderDate'];
}  
sum+=value['count'];
}); 
// at this point you are already aware in which category your records lies , just add a new field to mark it
 if(sum < 5){ category: userLessThan5};
 if(sum >= 5 && sum <=10){ category: user5to10};
 if(sum <= 10 && sum >= 15){ category: user10to15};
 if(sum <= 15 && sum >=20){ category: user15to20};
  ....
return {count:sum,lastOrderDate:lastOrderDate,category:category}; 
},
{ query:{status:"DELIVERED"},out:"order_total"}).find()
 db.order_total.aggregate([{ $group: { "_id": "$value.category", "users": { $sum: 1 } } }]);

answered Oct 31, 2018 by Omkar
• 67,290 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to format the output being written by MapReduce in Hadoop?

Here is a simple code demonstrate the ...READ MORE

answered Sep 5, 2018 in Big Data Hadoop by Frankie
• 9,810 points
93 views
0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,510 points
97 views
0 votes
1 answer

How hadoop mapreduce job is submitted to worker nodes?

Alright, I think you are basically looking ...READ MORE

answered Mar 29, 2018 in Big Data Hadoop by Ashish
• 2,630 points
1,541 views
0 votes
1 answer

Integration of Hadoop with Mongo DB concept

MongoDB isn't built to work on top ...READ MORE

answered Sep 25, 2018 in Big Data Hadoop by Frankie
• 9,810 points
52 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
2,399 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
245 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,205 views
0 votes
1 answer

In Hadoop MapReduce, how can i set an Object as the Value for Map output?

Try this and see if it works: public ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 67,290 points
33 views
0 votes
1 answer

Hadoop: How to get the column name along with the output in Hive?

You can get the column names by ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 67,290 points
187 views