How to fix corrupt HDFS FIles?

0 votes

How does someone fix a HDFS that's corrupt? I looked on the Apache/Hadoop website and it said its fsck command, which doesn't fix it. Hopefully someone who has run into this problem before can tell me how to fix this.

Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures.

When I ran bin/hadoop fsck / -delete, it listed the files that were corrupt or missing blocks. How do I make it not corrupt? This is on a practice machine so I COULD blow everything away but when we go live, I won't be able to "fix" it by blowing everything away so I'm trying to figure it out now.

Oct 25, 2018 in Big Data Hadoop by Neha
• 6,140 points
266 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use

  hdfs fsck /

to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with

  hdfs fsck / | egrep -v '^\.+$' | grep -v eplica

which ignores lines with nothing but dots and lines talking about replication.

Once you find a file that is corrupt

  hdfs fsck /path/to/corrupt/file -locations -blocks -files

Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.

You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again.

Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks

answered Oct 25, 2018 by Frankie
• 9,590 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to discover missing or corrupt HDFS data?

HDFS supports fsck command to check for ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by nitinrawat895
• 9,070 points
45 views
0 votes
1 answer

Hadoop HDFS: How to delete old files from HDFS?

You can use commands like this: hdfs dfs ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 65,850 points
579 views
0 votes
1 answer

How to read HDFS and local files with the same code in Java?

You can try something like this: ​ ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 65,850 points
83 views
0 votes
1 answer

How to concatenate hdfs files and store in output file?

You can use a combination of cat and put command. Something ...READ MORE

answered Dec 5, 2018 in Big Data Hadoop by Omkar
• 65,850 points
172 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
1,670 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
130 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,109 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
569 views
0 votes
1 answer

How to add user in supergroup of hdfs in linux?

Yes , now i have whole idea ...READ MORE

answered Sep 20, 2018 in Big Data Hadoop by Frankie
• 9,590 points
846 views
0 votes
1 answer

What is the standard way to create files in your hdfs file-system?

Well, it's so easy. Just enter the below ...READ MORE

answered Sep 22, 2018 in Big Data Hadoop by Frankie
• 9,590 points
44 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.