Skip to content

Instantly share code, notes, and snippets.

@sarveshseri
Last active November 18, 2015 10:35
Show Gist options
  • Save sarveshseri/bca2cb35c7c2516ea002 to your computer and use it in GitHub Desktop.
Save sarveshseri/bca2cb35c7c2516ea002 to your computer and use it in GitHub Desktop.

Fix corrupt file system

First of all switch to the hdfs user

sudo su hdfs
Check file system health
hdfs -fsck /
# or
hdfs fsck hdfs://hdfsHost:port/
Find corrupt files
hdfs fsck / | egrep -v '^\.+$' | grep -v eplica

Output will contain some lines like,

/path/to/corrupt/file: MISSING 1 blocks of total size 6153 B.................................

Now we have two choices... either delete the corrupted files or try to fix them if important.

Delete corrupt files
hdfs dfs -rm -skipTrash /path/to/corrupt/file
# or
hdfs dfs -rm -skipTrash hdfs://hdfsHost:port/path/to/corrupt/file
Try to fix corrupt files

Success depends on whether these blocks exist on some nodes or not. Lets try gathering the block information for file.

hdfs fsck /path/to/corrupt/file -locations -blocks -files

If lucky then from this data, you can track down the node where the corruption is. And if you are extremely lucky and have perseverence then may be you will be able to fix the corruption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment