Hadoop: Using composite-key

+1 vote

Suppose I have a tab delimited file containing user activity data formatted like this:

timestamp  user_id  page_id  action_id

I want to write a hadoop job to count user actions on each page, so the output file should look like this:

user_id  page_id  number_of_actions

I need something like composite key here - it would contain user_id and page_id. Is there any generic way to do this with hadoop? I couldn't find anything helpful. So far I'm emitting key like this in mapper:

context.write(new Text(user_id + "\t" + page_id), one);

It works, but I feel that it's not the best solution.

Nov 12, 2018 in Big Data Hadoop by slayer
• 29,170 points
247 views

1 answer to this question.

+1 vote

You can use a Writable, something like this:

public class UserPageWritable implements WritableComparable<UserPageWritable> {

  private String userId;
  private String pageId;

  @Override
  public void readFields(DataInput in) throws IOException {
    userId = in.readUTF();
    pageId = in.readUTF();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    out.writeUTF(userId);
    out.writeUTF(pageId);
  }

  @Override
  public int compareTo(UserPageWritable o) {
    return ComparisonChain.start().compare(userId, o.userId)
        .compare(pageId, o.pageId).result();
  }

}
answered Nov 12, 2018 by Omkar
• 67,660 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

Failed to restart Hadoop namenode using cloudera quickstart

You can use cloudera manager to manage ...READ MORE

answered Mar 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
821 views
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
197 views
0 votes
0 answers

Error running hadoop mapreduce in Python using Hadoop Streaming

I was trying a sample mapredyce code ...READ MORE

Apr 2, 2018 in Big Data Hadoop by nitinrawat895
• 10,710 points
156 views
0 votes
1 answer

How to get started with Hadoop and do some development using Eclipse IDE?

Alright, there are couple of things that ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by Ashish
• 2,630 points
125 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
3,294 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
390 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,207 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,182 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 67,660 points
330 views
0 votes
1 answer

Hadoop: Reading and Writing Sequencefile using Apis?

public class SequenceFilesTest { @Test ...READ MORE

answered Nov 9, 2018 in Big Data Hadoop by Omkar
• 67,660 points
33 views