This is what happens:
Map reduce framework will store intermediate output into local disk rather than HDFS as this would cause unnecessarily replication of files.
After, the whole Map computation everything eventually gets merged and dumped to disk and becomes the input for the Shuffling and Sorting stages that precede the Reducer.
Mapper output (intermediate data) is written to the Local file system (NOT HDFS) of each mapper slave node. Once data gets transferred to Reducer, We won’t be able to access these temporary files.
But. We have MultipleOutputFormat. It allows you to define multiple file names for the output of the Mapper or Reducer.
For further insight into MultipleOutputFormat, refer to the below links: