Serde input and output

–1 vote

What is the link between Input and output format with respect to serde?

Dec 27, 2018 in Big Data Hadoop by digger
• 26,700 points
112 views

1 answer to this question.

0 votes

Input Processing

Hive's execution engine (referred to as just engine henceforth) first uses the configured InputFormat to read in a record of data (the value object returned by the RecordReader of the InputFormat).

The engine then invokes Serde.deserialize() to perform deserialization of the record. There is no real binding that the deserialized object returned by this method indeed be a fully deserialized one.

The engine also gets hold of the ObjectInspector to use by invoking Serde.getObjectInspector(). This has to be a subclass of structObjectInspector since a record representing a row of input data is essentially a struct type.

The engine passes the deserialized object and the object inspector to all operators for their use in order to get the needed data from the record. The object inspector knows how to construct individual fields out of a deserialized record. For example, StructObjectInspector has a method called getStructFieldData() which returns a certain field in the record. This is the mechanism to access individual fields.

Output Processing

Output is analogous to input. The engine passes the deserialized Object representing a record and the corresponding ObjectInspector to Serde.serialize(). In this context serialization means converting the record object to an object of the type expected by the OutputFormat which will be used to perform the write. To perform this conversion, the serialize() method can make use of the passed ObjectInspector to get the individual fields in the record in order to convert the record to the appropriate type.

answered Dec 27, 2018 by Omkar
• 69,090 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Can we use different input and output format classes?

Yes, InputFormatClass and OutputFormatClass are independent of ...READ MORE

answered Jul 22, 2019 in Big Data Hadoop by Jishan
109 views
0 votes
1 answer

How to solve error caused due to output types of mapper and reducer not matching?

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 9, 2019 in Big Data Hadoop by Rishab
220 views
0 votes
1 answer

Output types of mapper and reducer does not match

job.setOutputValueClass will set the types expected as ...READ MORE

answered Jul 22, 2019 in Big Data Hadoop by Reena
1,175 views
0 votes
1 answer

input split and block size with examples

Hi@siva, Hadoop HDFS split large files into small ...READ MORE

answered Jul 13, 2020 in Big Data Hadoop by MD
• 95,060 points
157 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
6,822 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,093 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
48,001 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,589 views
0 votes
1 answer

How to concatenate hdfs files and store in output file?

You can use a combination of cat and put command. Something ...READ MORE

answered Dec 5, 2018 in Big Data Hadoop by Omkar
• 69,090 points
1,917 views
0 votes
1 answer

Difference between hive.exec.compress.output=true; and mapreduce.output.fileoutputformat.compress=true;

Hey there! The definition of these two properties ...READ MORE

answered Dec 28, 2018 in Big Data Hadoop by Omkar
• 69,090 points
2,605 views