How to fix corrupt HDFS FIles?

0 votes

How does someone fix a HDFS that's corrupt? I looked on the Apache/Hadoop website and it said its fsck command, which doesn't fix it. Hopefully someone who has run into this problem before can tell me how to fix this.

Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures.

When I ran bin/hadoop fsck / -delete, it listed the files that were corrupt or missing blocks. How do I make it not corrupt? This is on a practice machine so I COULD blow everything away but when we go live, I won't be able to "fix" it by blowing everything away so I'm trying to figure it out now.

Oct 25, 2018 in Big Data Hadoop by Neha
• 6,280 points
379 views

1 answer to this question.

0 votes

You can use

  hdfs fsck /

to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with

  hdfs fsck / | egrep -v '^\.+$' | grep -v eplica

which ignores lines with nothing but dots and lines talking about replication.

Once you find a file that is corrupt

  hdfs fsck /path/to/corrupt/file -locations -blocks -files

Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.

You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again.

Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks

answered Oct 25, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to fix corrupt files on HDFS

1 - Spark if following slave/master architecture. So ...READ MORE

answered 3 days ago in Big Data Hadoop by ravikiran
• 3,560 points
14 views
0 votes
1 answer

How to discover missing or corrupt HDFS data?

HDFS supports fsck command to check for ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
61 views
0 votes
1 answer

Hadoop HDFS: How to delete old files from HDFS?

You can use commands like this: hdfs dfs ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 67,140 points
917 views
0 votes
1 answer

How to read HDFS and local files with the same code in Java?

You can try something like this: ​ ...READ MORE

answered Nov 22, 2018 in Big Data Hadoop by Omkar
• 67,140 points
205 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,110 points
2,058 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,110 points
198 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,534 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
771 views
0 votes
1 answer

How to add user in supergroup of hdfs in linux?

Yes , now i have whole idea ...READ MORE

answered Sep 20, 2018 in Big Data Hadoop by Frankie
• 9,810 points
1,357 views
0 votes
1 answer

What is the standard way to create files in your hdfs file-system?

Well, it's so easy. Just enter the below ...READ MORE

answered Sep 22, 2018 in Big Data Hadoop by Frankie
• 9,810 points
68 views