Serde input and output

–1 vote

What is the link between Input and output format with respect to serde?

Dec 27, 2018 in Big Data Hadoop by digger
• 26,550 points
37 views

1 answer to this question.

0 votes

Input Processing

Hive's execution engine (referred to as just engine henceforth) first uses the configured InputFormat to read in a record of data (the value object returned by the RecordReader of the InputFormat).

The engine then invokes Serde.deserialize() to perform deserialization of the record. There is no real binding that the deserialized object returned by this method indeed be a fully deserialized one.

The engine also gets hold of the ObjectInspector to use by invoking Serde.getObjectInspector(). This has to be a subclass of structObjectInspector since a record representing a row of input data is essentially a struct type.

The engine passes the deserialized object and the object inspector to all operators for their use in order to get the needed data from the record. The object inspector knows how to construct individual fields out of a deserialized record. For example, StructObjectInspector has a method called getStructFieldData() which returns a certain field in the record. This is the mechanism to access individual fields.

Output Processing

Output is analogous to input. The engine passes the deserialized Object representing a record and the corresponding ObjectInspector to Serde.serialize(). In this context serialization means converting the record object to an object of the type expected by the OutputFormat which will be used to perform the write. To perform this conversion, the serialize() method can make use of the passed ObjectInspector to get the individual fields in the record in order to convert the record to the appropriate type.

answered Dec 27, 2018 by Omkar
• 67,660 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Can we use different input and output format classes?

Yes, InputFormatClass and OutputFormatClass are independent of ...READ MORE

answered Jul 22 in Big Data Hadoop by Jishan
29 views
0 votes
1 answer

How to solve error caused due to output types of mapper and reducer not matching?

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 9 in Big Data Hadoop by Rishab
35 views
0 votes
1 answer

Output types of mapper and reducer does not match

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 22 in Big Data Hadoop by Reena
115 views
0 votes
10 answers

What is the difference between Mongodb and Hadoop?

Apart from the similarity that they are ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Deeraj
2,714 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,388 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
408 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,919 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,235 views
0 votes
1 answer

How to concatenate hdfs files and store in output file?

You can use a combination of cat and put command. Something ...READ MORE

answered Dec 5, 2018 in Big Data Hadoop by Omkar
• 67,660 points
601 views
0 votes
1 answer

Difference between hive.exec.compress.output=true; and mapreduce.output.fileoutputformat.compress=true;

Hey there! The definition of these two properties ...READ MORE

answered Dec 28, 2018 in Big Data Hadoop by Omkar
• 67,660 points
755 views