-
-
Save tommybutler/7592005 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# install the smartctl package first! (apt-get install smartctl) | |
if sudo true | |
then | |
true | |
else | |
echo 'Root privileges required' | |
exit 1 | |
fi | |
for drive in /dev/sd[a-z] /dev/sd[a-z][a-z] | |
do | |
if [[ ! -e $drive ]]; then continue ; fi | |
echo -n "$drive " | |
smart=$( | |
sudo smartctl -H $drive 2>/dev/null | | |
grep '^SMART overall' | | |
awk '{ print $6 }' | |
) | |
[[ "$smart" == "" ]] && smart='unavailable' | |
echo "$smart" | |
done |
Ya, the uncorrected errors would defiantly land that disk on my "SUS" list! Even the volume fast ECC corrections would get it a "warn" from me. Sometimes uncorrected errors just happens though, and that isn't a big number. But I can totally see wanting that out of your array!
If it is a performance sensitive production environment, I would 100% yank that drive just because. Those retries can cause odd performance issues for customers that are almost impossible to pinpoint as the cause.
I do have some HGSTs that are 8 years old now and still going strong, some with almost as many fast ECC errors, but others that I have evicted just because some of those metrics were increasing at an unhealthy rate. I have the luxury of having 4 parity disks though. That plus regular scrubbing I might keep that one in my cluster unless it got worse. If I didn't have that, I would not trust it. But I am a cheapskate when it comes to my home lab!
Thanks for the heads up!
That scrutiny utility looks awesome. Will try it out. Learned something new today.
I just scanned 28 drives I had in JBOD array, and saw 4 that had elements in the grown defect list (41, 11, 11, 20).
One drive with 11 elements had in the grown defect list, also had 3 uncorrected errors. Another drive with 20 elements in the grown defect list, had 1 uncorrected error.
Here's a report for the drive with 3 errors. Will be keeping an eye on it