Hadoop: Using composite-key

0 votes

Suppose I have a tab delimited file containing user activity data formatted like this:

timestamp  user_id  page_id  action_id

I want to write a hadoop job to count user actions on each page, so the output file should look like this:

user_id  page_id  number_of_actions

I need something like composite key here - it would contain user_id and page_id. Is there any generic way to do this with hadoop? I couldn't find anything helpful. So far I'm emitting key like this in mapper:

context.write(new Text(user_id + "\t" + page_id), one);

It works, but I feel that it's not the best solution.

Nov 12, 2018 in Big Data Hadoop by slayer
• 29,050 points
98 views

1 answer to this question.

0 votes

You can use a Writable, something like this:

public class UserPageWritable implements WritableComparable<UserPageWritable> {

  private String userId;
  private String pageId;

  @Override
  public void readFields(DataInput in) throws IOException {
    userId = in.readUTF();
    pageId = in.readUTF();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    out.writeUTF(userId);
    out.writeUTF(pageId);
  }

  @Override
  public int compareTo(UserPageWritable o) {
    return ComparisonChain.start().compare(userId, o.userId)
        .compare(pageId, o.pageId).result();
  }

}
answered Nov 12, 2018 by Omkar
• 67,140 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

Failed to restart Hadoop namenode using cloudera quickstart

You can use cloudera manager to manage ...READ MORE

answered Mar 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
566 views
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7 in Big Data Hadoop by pradeep
121 views
0 votes
0 answers

Error running hadoop mapreduce in Python using Hadoop Streaming

I was trying a sample mapredyce code ...READ MORE

Apr 2, 2018 in Big Data Hadoop by nitinrawat895
• 10,150 points
77 views
0 votes
1 answer

How to get started with Hadoop and do some development using Eclipse IDE?

Alright, there are couple of things that ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by Ashish
• 2,630 points
69 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,150 points
2,062 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,150 points
199 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,570 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
771 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 67,140 points
190 views
0 votes
1 answer

Hadoop: Reading and Writing Sequencefile using Apis?

public class SequenceFilesTest { @Test ...READ MORE

answered Nov 9, 2018 in Big Data Hadoop by Omkar
• 67,140 points
21 views