What is a data serialization system

0 votes
According to Apache AVRO project, "Avro is a serialization system". By saying data serialization system, does it mean that Avro is a product or API?

also, I am not quite sure about what a data serialization system is? for now, my understanding is that it is a protocol that defines how the data object is passed over the network. Can anyone help explain it in an intuitive way that it is easier for people with limited distributed computing background to understand?
Oct 17, 2018 in Big Data Hadoop by Neha
• 6,300 points
1,049 views

1 answer to this question.

0 votes

So when Hadoop was being written by Doug Cutting he decided that the standard Java method of serializing Java object using Java Object Serialization (Java Serialization) didn't meet his requirements for Hadoop. Namely, these requirements were:

  1. Serialize the data into a compact binary format.
  2. Be fast, both in performance and how quickly it allowed data to be transfered.
  3. Interoperable so that other languages plug into Hadoop more easily.

As he described Java Serialization:

It looked big and hairy and I though we needed something lean and mean

Instead of using Java Serialization they wrote their own serialization framework. The main perceived problems with Java Serialization was that it writes the classname of each object being serialized to the stream, with each subsequent instance of that class containing a 5 byte reference to the first, instead of the classname.

As well as reducing the effective bandwidth of the stream this causes problems with random access as well as sorting of records in a serialized stream. Thus Hadoop serialization doesn't write the classname or the required references, and makes the assumption that the client knows the expected type.

Java Serialization also creates a new object for each one that is deserialized. Hadoop Writables, which implement Hadoop Serialization, can be reused. Thus, helping to improve the performance of MapReduce which accentually serializes and deserializes billions of records.

Avro fits into Hadoop in that it approaches serialization in a different manner. The client and server exchange a scheme which describes the datastream. This helps make it fast, compact and importantly makes it easier to mix languages together.

So Avro defines a serialization format, a protocol for clients and servers to communicate these serial streams and a way to compactly persist data in files.

answered Oct 17, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the command to know the details of your data created in a table in Hive?

Hey, Yes, there is a way to check ...READ MORE

answered May 15, 2019 in Big Data Hadoop by Gitika
• 65,910 points
1,268 views
0 votes
1 answer

What is the use of Apache Kafka in a Big Data Cluster?

Kafka is a Distributed Messaging System which ...READ MORE

answered Jun 21, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
730 views
0 votes
3 answers

What is Hive? Is Hive a database?

Hive is a data Warehouse infrastructure/system built ...READ MORE

answered Jul 1, 2019 in Big Data Hadoop by Ved Gupta
22,986 views
0 votes
1 answer

What is ClickStream Data Analysis

On a Web site, clickstream analysis (also ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
1,023 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,619 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,951 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,296 views
0 votes
1 answer
0 votes
1 answer

What is the difference between a Big Data Warehouse and a traditional Data Warehouse?

Hadoop is similar in architecture to MPP data ...READ MORE

answered Aug 10, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,172 views
0 votes
1 answer

What is the difference between a Big Data Warehouse and a traditional Data Warehouse

Hadoop is similar in architecture to MPP data ...READ MORE

answered Aug 10, 2018 in Big Data Hadoop by Frankie
• 9,830 points
503 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP