How does Avro schema evolution work?

0 votes

 I am new to Hadoop and programming, and I am a little confused about Avro schema evolution. I will explain what I understand about Avro so far.

Avro is a serialization tool that stores binary data with its json schema at the top. The schema looks like this.

{
    "namespace":"com.trese.db.model",
    "type":"record",
    "doc":"This Schema describes about Product",
    "name":"Product",
    "fields":[
        {"name":"product_id","type": "long"},
       {"name":"product_name","type": "string","doc":"This is the name of the product"},
      {"name":"cost","type": "float", "aliases":["price"]},
      {"name":"discount","type": "float", "default":5}
    ]
}

Now my question is why we need evolution? I have read that we can use default in the schema for new fields; but if we add a new schema in the file, that earlier schema will be overwritten. We cannot have two schema's for a single file.

Another question is, what are reader and writer schema's and how do they help?

Sep 19, 2018 in Big Data Hadoop by Neha
• 6,140 points
87 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

If you have one avro file and you want to change its schema, you can rewrite that file with a new schema inside. But what if you have terabytes of avro files and you want to change their schema? Will you rewrite all of the data, every time the schema changes?

Schema evolution allows you to update the schema used to write new data, while maintaining backwards compatibility with the schema(s) of your old data. Then you can read it all together, as if all of the data has one schema. Of course there are precise rules governing the changes allowed, to maintain compatibility. Those rules are listed under Schema Resolution.

There are other use cases for reader and writer schemas, beyond evolution. You can use a reader as a filter. Imagine data with hundreds of fields, of which you are only interested in a handful. You can create a schema for that handful of fields, to read only the data you need. You can go the other way and create a reader schema which adds default data, or use a schema to join the schemas of two different datasets.

Or you can just use one schema, which never changes, for both reading and writing. That's the simplest case.

answered Sep 19, 2018 by Frankie
• 9,590 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How does Hadoop's block replacement policy work?

In your case, One copy of the ...READ MORE

answered Aug 16, 2018 in Big Data Hadoop by nitinrawat895
• 9,070 points
42 views
0 votes
1 answer

How does the HDFS Client knows the block size while writing?

HDFS is designed in a way where ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
15 views
0 votes
1 answer

How to work with distributed cache in Hadoop?

The problem with your code is that ...READ MORE

answered Apr 20, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
298 views
0 votes
1 answer

How does HDFS Federation help HDFS Scale horizontally?

Let me try to explain you the ...READ MORE

answered Jul 12, 2018 in Big Data Hadoop by nitinrawat895
• 9,070 points
140 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
1,680 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
132 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,169 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
575 views
0 votes
2 answers

How does Hadoop/Spark is used for building large analytics report?

The best possible framework for this task ...READ MORE

answered Aug 7, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
134 views
0 votes
1 answer

How compression works in Hadoop?

It basically depends on the file type ...READ MORE

answered Jul 26, 2018 in Big Data Hadoop by Frankie
• 9,590 points
64 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.