Can we use different i p and o p format classes in mapreduce code

Question

Can we use different i/p and o/p format classes- (say one TextInputFormat and another KeyValueTextInputFormat)?

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

score 0 · Answer 1 · Jul 10, 2019

Yes, InputFormatClass and OutputFormatClass are independent of each other.

InputFormatClass

InputFormat defines how the input files are split up and read in Hadoop. Initially, the data for a MapReduce task is stored in input files, and input files typically reside in HDFS. Although these files format is arbitrary, line-based log files and binary format can be used. Using InputFormat we define how these input files are split and read.

OutputFormatClass

The Hadoop Output Format checks the Output-Specification of the job. It determines how RecordWriter implementation is used to write output to output files. As we know, Reducer takes as input a set of an intermediate key-value pair produced by the mapper and runs a reducer function on them to generate output that is again zero or more key-value pairs. RecordWriter writes these output key-value pairs from the Reducer phase to output files. OutputFormat instances provided by Hadoop are used to write to files on the HDFS or local disk. OutputFormat describes the output-specification for a Map-Reduce job.