How to resolve corrupted blocks in Hadoop?

0 votes
Hi. I am new to hadoop and had stored some data on hdfs. Now the data in the blocks has been corrupted. How can I recover them?
Jul 5, 2019 in Big Data Hadoop by Tarun

1 answer to this question.

0 votes

You can use

  hdfs fsck /

to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with

  hdfs fsck / | egrep -v '^\.+$' | grep -v eplica

which ignores lines with nothing but dots and lines talking about replication.

Once you find a file that is corrupt

  hdfs fsck /path/to/corrupt/file -locations -blocks -files

Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.

You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again.

Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks.

Once you determine what happened and you cannot recover any more blocks, just use the

  hdfs fs -rm /path/to/file/with/permanently/missing/blocks

command to get your HDFS filesystem back to healthy so you can start tracking new errors as they occur.

answered Jul 5, 2019 by Nandini

Related Questions In Big Data Hadoop

0 votes
1 answer

How to configure secondary namenode in Hadoop 2.x ?

bin/ start [namenode | secondarynamenode | datanode ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 57,550 points
0 votes
11 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
0 votes
1 answer

How to run Hadoop in Docker containers?

Hi, You can run Hadoop in Docker container. Follow ...READ MORE

answered Jan 24 in Big Data Hadoop by MD
• 41,340 points
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto