Hadoop Using composite-key

+1 vote

Suppose I have a tab delimited file containing user activity data formatted like this:

timestamp  user_id  page_id  action_id

I want to write a hadoop job to count user actions on each page, so the output file should look like this:

user_id  page_id  number_of_actions

I need something like composite key here - it would contain user_id and page_id. Is there any generic way to do this with hadoop? I couldn't find anything helpful. So far I'm emitting key like this in mapper:

context.write(new Text(user_id + "\t" + page_id), one);

It works, but I feel that it's not the best solution.

Nov 12, 2018 in Big Data Hadoop by slayer
• 29,350 points
1,872 views

1 answer to this question.

+1 vote

You can use a Writable, something like this:

public class UserPageWritable implements WritableComparable<UserPageWritable> {

  private String userId;
  private String pageId;

  @Override
  public void readFields(DataInput in) throws IOException {
    userId = in.readUTF();
    pageId = in.readUTF();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    out.writeUTF(userId);
    out.writeUTF(pageId);
  }

  @Override
  public int compareTo(UserPageWritable o) {
    return ComparisonChain.start().compare(userId, o.userId)
        .compare(pageId, o.pageId).result();
  }

}
answered Nov 12, 2018 by Omkar
• 69,210 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

Failed to restart Hadoop namenode using cloudera quickstart

You can use Cloudera Manager to manage ...READ MORE

answered Mar 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points

edited Jun 9, 2020 by MD 3,817 views
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
1,881 views
0 votes
1 answer

Error running hadoop mapreduce in Python using Hadoop Streaming

Hi As you write mapper and reducer program  ...READ MORE

answered Jan 21, 2020 in Big Data Hadoop by anonymous
2,207 views
0 votes
1 answer

How to get started with Hadoop and do some development using Eclipse IDE?

Alright, there are couple of things that ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by Ashish
• 2,650 points
1,761 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,601 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,207 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,771 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,286 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 69,210 points
2,051 views
0 votes
1 answer

Hadoop: Reading and Writing Sequencefile using Apis?

public class SequenceFilesTest { @Test ...READ MORE

answered Nov 9, 2018 in Big Data Hadoop by Omkar
• 69,210 points
339 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP