Serde input and output

0 votes

What is the link between Input and output format with respect to serde?

Dec 27, 2018 in Big Data Hadoop by digger
• 27,640 points
32 views

1 answer to this question.

0 votes

Input Processing

Hive's execution engine (referred to as just engine henceforth) first uses the configured InputFormat to read in a record of data (the value object returned by the RecordReader of the InputFormat).

The engine then invokes Serde.deserialize() to perform deserialization of the record. There is no real binding that the deserialized object returned by this method indeed be a fully deserialized one.

The engine also gets hold of the ObjectInspector to use by invoking Serde.getObjectInspector(). This has to be a subclass of structObjectInspector since a record representing a row of input data is essentially a struct type.

The engine passes the deserialized object and the object inspector to all operators for their use in order to get the needed data from the record. The object inspector knows how to construct individual fields out of a deserialized record. For example, StructObjectInspector has a method called getStructFieldData() which returns a certain field in the record. This is the mechanism to access individual fields.

Output Processing

Output is analogous to input. The engine passes the deserialized Object representing a record and the corresponding ObjectInspector to Serde.serialize(). In this context serialization means converting the record object to an object of the type expected by the OutputFormat which will be used to perform the write. To perform this conversion, the serialize() method can make use of the passed ObjectInspector to get the individual fields in the record in order to convert the record to the appropriate type.

answered Dec 27, 2018 by Omkar
• 67,380 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Can we use different input and output format classes?

Yes, InputFormatClass and OutputFormatClass are independent of ...READ MORE

answered Jul 22 in Big Data Hadoop by Jishan
21 views
0 votes
1 answer

How to solve error caused due to output types of mapper and reducer not matching?

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 9 in Big Data Hadoop by Rishab
26 views
0 votes
1 answer

Output types of mapper and reducer does not match

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 22 in Big Data Hadoop by Reena
30 views
0 votes
10 answers

What is the difference between Mongodb and Hadoop?

Apart from the similarity that they are ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Deeraj
2,431 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,686 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
282 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,371 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
980 views
0 votes
1 answer

How to concatenate hdfs files and store in output file?

You can use a combination of cat and put command. Something ...READ MORE

answered Dec 5, 2018 in Big Data Hadoop by Omkar
• 67,380 points
421 views
0 votes
1 answer

Difference between hive.exec.compress.output=true; and mapreduce.output.fileoutputformat.compress=true;

Hey there! The definition of these two properties ...READ MORE

answered Dec 28, 2018 in Big Data Hadoop by Omkar
• 67,380 points
469 views