Difference between Text and String in Hadoop

Question

Elaborate the difference between&#160;org.apache.hadoop.io.Text&#160;and&#160;java.lang.String&#160;in the Apache Hadoop framework. Is it not possible to&#160;use&#160;String&#160;instead of introducing a new&#160;Text&#160;class? I have tried to find the difference and&#160;I don't understand it yet. Can anyone explain to me these with suitable examples?

ravikiran · Answer

The binary representation of a Text object is a variable-length integer containing the number of bytes in the UTF-8 representation of the string, followed by the UTF-8 bytes themselves.Text is a replacement for the UTF8 class, which was deprecated because it didn&#8217;t support strings whose encoding was over 32,767 bytes and because it used Java&#8217;s modified UTF-8.Furthermore, Text uses standard UTF-8, which makes it potentially easier to interoperate with other tools that understand UTF-8.Following are some of the differences in brief related to its functioning with respect to String:Indexing:&#160;Because of its emphasis on using standard UTF-8, there are some differences between Text and the Java String class. Indexing for the Text class is in terms of position in the encoded byte sequence, not the Unicode character in the string, or the Java char code unit (as it is for String).For instance,&#160;charAt()&#160;returns an int representing a Unicode code point, unlike the String variant that returns a char.Iteration:&#160;Iterating over the Unicode characters in Text is complicated by the use of byte offsets for indexing since you can&#8217;t just increment the index.Mutable:&#160;Another difference with String is that Text is mutable (like all Writable implementations in Hadoop, except NullWritable, which is a singleton). You can reuse a Text instance by calling one of the set()methods on it.Resorting to String:&#160;The text doesn&#8217;t have as rich an API for manipulating strings as&#160;java.lang.String, so in many cases, you need to convert the text object to a String. This is done in the usual way, using the&#160;toString()method:I hope this helps.

Difference between Text and String in Hadoop

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

What is the difference between a zero reducer and identity reducer in Hadoop Mapreduce?

What is the difference between Hadoop MapReduce and built-in MapReduce?

What is the difference between MapReduce and YARN in Hadoop?

What is the difference between Mongodb and Hadoop?

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

Difference between Hadoop Mapreduce in MongoDB versus MangoDB's built-in MapReduce

Explain to me the difference between HBase and HDFS.

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES