Hadoop Using composite-key

+1 vote

Suppose I have a tab delimited file containing user activity data formatted like this:

timestamp  user_id  page_id  action_id

I want to write a hadoop job to count user actions on each page, so the output file should look like this:

user_id  page_id  number_of_actions

I need something like composite key here - it would contain user_id and page_id. Is there any generic way to do this with hadoop? I couldn't find anything helpful. So far I'm emitting key like this in mapper:

context.write(new Text(user_id + "\t" + page_id), one);

It works, but I feel that it's not the best solution.

Nov 12, 2018 in Big Data Hadoop by slayer
• 29,310 points
1,081 views

1 answer to this question.

+1 vote

You can use a Writable, something like this:

public class UserPageWritable implements WritableComparable<UserPageWritable> {

  private String userId;
  private String pageId;

  @Override
  public void readFields(DataInput in) throws IOException {
    userId = in.readUTF();
    pageId = in.readUTF();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    out.writeUTF(userId);
    out.writeUTF(pageId);
  }

  @Override
  public int compareTo(UserPageWritable o) {
    return ComparisonChain.start().compare(userId, o.userId)
        .compare(pageId, o.pageId).result();
  }

}
answered Nov 12, 2018 by Omkar
• 69,170 points

Related Questions In Big Data Hadoop

+1 vote
2 answers

Failed to restart Hadoop namenode using cloudera quickstart

You can use Cloudera Manager to manage ...READ MORE

answered Mar 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points

edited Jun 9, 2020 by MD 2,389 views
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
670 views
0 votes
1 answer

Error running hadoop mapreduce in Python using Hadoop Streaming

Hi As you write mapper and reducer program  ...READ MORE

answered Jan 21, 2020 in Big Data Hadoop by anonymous
940 views
0 votes
1 answer

How to get started with Hadoop and do some development using Eclipse IDE?

Alright, there are couple of things that ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by Ashish
• 2,650 points
1,150 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
7,893 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,327 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
62,497 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,992 views
0 votes
1 answer

Hadoop: How to keep duplicates in Hive using collect_set()?

SELECT hash_id, COLLECT_LIST(num_of_cats) AS ...READ MORE

answered Nov 2, 2018 in Big Data Hadoop by Omkar
• 69,170 points
1,192 views
0 votes
1 answer

Hadoop: Reading and Writing Sequencefile using Apis?

public class SequenceFilesTest { @Test ...READ MORE

answered Nov 9, 2018 in Big Data Hadoop by Omkar
• 69,170 points
133 views