What are SUCCESS and part-r-00000 files in Hadoop

Question

I run Hadoop on my Virtual Box running CentOS. Whenever I submit any job to Hadoop, always 2 different files are created as output. One is SUCCESS and other is part-r-00000 files. I saw that output always presents in part-r-00000 file, but why SUCCESS file is created along with part-r-00000 file? What is the significance of the files created or is it just randomly created?

nitinrawat895 · Answer 1 · Apr 12, 2018

Yes, both the files i.e. SUCCESS and part-r-00000 are by-default created.

On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS.

The output files are by default named as part-x-yyyyy
where:
1) x iseither 'm' or 'r', depending on whether the job was a map only job, or reduce
2)yyyyy is the Mapper, or Reducer task number (zero based)

So if a job which has 10 reducers, files generated will have named part-r-00000 to part-r-00009, one for each reducer task.

It is possible to change the default name.
This is all you need to do in the Driver class to change the default of the output file:
job.getConfiguration().set("mapreduce.output.basename", "edureka")

Hope this will answer your question to some extent.