What are SUCCESS and part-r-00000 files in Hadoop?

0 votes

I run Hadoop on my Virtual Box running CentOS. Whenever I submit any job to Hadoop, always 2 different files are created as output. One is SUCCESS and other is part-r-00000 files. I saw that output always presents in part-r-00000 file, but why SUCCESS file is created along with part-r-00000 file? What is the significance of the files created or is it just randomly created?

Apr 12, 2018 in Big Data Hadoop by Shubham
• 12,890 points
1,411 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Yes, both the files i.e. SUCCESS and part-r-00000 are by-default created.

On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS.

The output files are by default named as part-x-yyyyy
where:
1) x is
either 'm' or 'r', depending on whether the job was a map only job, or reduce
2)
yyyyy is the Mapper, or Reducer task number (zero based)

So if a job which has 10 reducers, files generated will have named part-r-00000 to part-r-00009, one for each reducer task.

It is possible to change the default name.
This is all you need to do in the Driver class to change the default of the output file:

job.getConfiguration().set("mapreduce.output.basename", "edureka")

Hope this will answer your question to some extent.

answered Apr 12, 2018 by nitinrawat895
• 9,610 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What are active and passive “NameNodes” in Hadoop?

In HA (High Availability) architecture, we have ...READ MORE

answered Dec 13, 2018 in Big Data Hadoop by Frankie
• 9,710 points
161 views
0 votes
1 answer

What are the site-specific configuration files in Hadoop?

There are different site specific configuration to ...READ MORE

answered Apr 9 in Big Data Hadoop by Gitika
• 15,950 points
26 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
674 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,610 points
1,866 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,010 points
53 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,311 views
0 votes
1 answer

Is Kafka and Zookeeper are required in a Big Data Cluster?

Apache Kafka is one of the components ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by nitinrawat895
• 9,610 points
289 views
0 votes
3 answers

What are differences between NameNode and Secondary NameNode?

File metadata information is stored by Namenode ...READ MORE

answered Apr 7 in Big Data Hadoop by anonymous
1,129 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.