What is the difference between Writable & WritableComparable in Hadoop?

0 votes
Could anyone please explain me that:

What is Writable and Writable Comparable interface in Hadoop?

What is different between these two?

Please explain with an example.
Oct 3, 2018 in Big Data Hadoop by Neha
• 6,280 points
670 views

1 answer to this question.

0 votes

Writable in an interface in Hadoop and types in Hadoop must implement this interface. Hadoop provides these writable wrappers for almost all Java primitive types and some other types, but sometimes we need to pass custom objects and these custom objects should implement Hadoop's Writable interface. Hadoop MapReduce uses implementations of Writables for interacting with user-provided Mappers and Reducers.

To implement the Writable interface we require two methods:

public interface Writable {
void readFields(DataInput in);
void write(DataOutput out);
}

Why use Hadoop Writable(s)?

As we already know, data needs to be transmitted between different nodes in a distributed computing environment. This requires serialization and deserialization of data to convert the data that is in a structured format to a byte stream and vice-versa. Hadoop, therefore, uses simple and efficient serialization protocol to serialize data between map and reduce phase and these are called Writable(s). Some of the examples of writables as already mentioned before are IntWritable, LongWritable, BooleanWritable, and FloatWritable.

Refer: https://developer.yahoo.com/hadoop/tutorial/module5.html for example

WritableComparable interface is just a subinterface of the Writable and java.lang.Comparable interfaces. For implementing a WritableComparable we must have compareTo method apart from readFields and write methods, as shown below:

public interface WritableComparable extends Writable, Comparable
{
    void readFields(DataInput in);
    void write(DataOutput out);
    int compareTo(WritableComparable o)
}

Comparison of types is crucial for MapReduce, where there is a sorting phase during which keys are compared with one another.

Implementing a comparator for WritableComparables like the org.apache.hadoop.io.RawComparator interface will definitely help speed up your Map/Reduce (MR) Jobs. As you may recall, a MR Job is composed of receiving and sending key-value pairs. The process looks like the following.

(K1,V1) –> Map –> (K2,V2)
(K2,List[V2]) –> Reduce –> (K3,V3)

The key-value pairs (K2,V2) are called the intermediary key-value pairs. They are passed from the mapper to the reducer. Before these intermediary key-value pairs reach the reducer, a shuffle and sort step is performed.

The shuffle is the assignment of the intermediary keys (K2) to reducers and the sort is the sorting of these keys. In this blog, by implementing the RawComparator to compare the intermediary keys, this extra effort will greatly improve sorting. Sorting is improved because the RawComparator will compare the keys by byte. If we did not use RawComparator, the intermediary keys would have to be completely deserialized to perform a comparison.

Summary:

1)WritableComparables can be compared to each other, typically via Comparators. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface.

2) Any type which is to be used as a value in the Hadoop Map-Reduce framework should implement the Writable interface.

answered Oct 3, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
13 answers

What is the difference between Hadoop/HDFS & HBase?

HDFS is a distributed file system whereas ...READ MORE

answered Apr 26 in Big Data Hadoop by Arihar
• 160 points
10,551 views
0 votes
1 answer
0 votes
1 answer

What is the difference between MapReduce and YARN in Hadoop?

MapReduce: MapReduce is an algorithm used to store ...READ MORE

answered Dec 19, 2018 in Big Data Hadoop by Omkar
• 67,660 points
466 views
0 votes
10 answers

What is the difference between Mongodb and Hadoop?

Apart from the similarity that they are ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Deeraj
2,638 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
3,321 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,364 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,193 views
0 votes
1 answer
0 votes
1 answer

What is the difference between Hadoop MapReduce and built-in MapReduce?

Differences are as follows: Hadoop's MR can be ...READ MORE

answered Sep 11, 2018 in Big Data Hadoop by Frankie
• 9,810 points
218 views
0 votes
1 answer