Skip to content

Instantly share code, notes, and snippets.

@j0ju
Last active July 25, 2016 06:11
Show Gist options
  • Save j0ju/aac36dc1cb335e1b1a1daabceeace8bb to your computer and use it in GitHub Desktop.
Save j0ju/aac36dc1cb335e1b1a1daabceeace8bb to your computer and use it in GitHub Desktop.
Simple script for collectd exec plugin
#!/bin/sh
if [ "$(id -u)" -ne 0 ]; then
exec sudo "$0"
fi
HOSTNAME="${COLLECTD_HOSTNAME:-`hostname -f`}" # RH / CentOs
#HOSTNAME="${COLLECTD_HOSTNAME:-`hostname`}" # Debian/Ubuntu
INTERVAL="${COLLECTD_INTERVAL:-60}"
INTERVAL="${INTERVAL%.*}"
while :; do
# this collects at first ata devices
DEVS_TO_SCAN=
DEVS_TO_SCAN="$DEVS_TO_SCAN $(awk '$NF !~ "^(dm-|md|sr|zram|name|sd[a-z]+[1-9])" && $NF != "" {print "/dev/"$NF}' /proc/partitions)"
#DEVS_TO_SCAN="$DEVS_TO_SCAN /dev/disk/by-id/ata-*" # ata-$VENDOR-$MODEL-$SERNO
#DEVS_TO_SCAN="$DEVS_TO_SCAN /dev/disk/by-id/wwn-*" # wwn-$WWN
#DEVS_TO_SCAN="$DEVS_TO_SCAN /dev/disk/by-id/scsi-*" # scsi-$SCSI($WNN), scsi-$VENDOR-$MODEL-$SERNO
#DEVS_TO_SCAN="$DEVS_TO_SCAN /dev/disk/by-id/usb-*" # usb-$VENDOR-$MODEL-$SERNO
#DEVS_TO_SCAN="$DEVS_TO_SCAN /dev/sd[a-z] /dev/sda[a-z][a-z]" # fallback
found_devs=
for dev in $DEVS_TO_SCAN; do # to be extended
# ignore partitions from /dev/disk/by-*/*-part[1-9]*
case "$dev" in
*-part[1-9] ) continue
;; esac
# scan devices only once
realdev="$(readlink -f "$dev")"
case "$found_devs" in
*" $realdev "* ) continue ;;
esac
found_devs="$found_devs $realdev "
# skip errors
smartctl_output="$(smartctl -a "$realdev")"
if [ "$?" != 0 ]; then
continue
fi
serial="$( echo "$smartctl_output" | sed -nre 's/^Serial Number:[[:space:]]+(.*)$/\1/p' | tr '.-' '__' )"
instance="${realdev##*/}"
#instance="$serial"
instance="${realdev##*/}_${serial}"
echo "$smartctl_output" | \
while read attribute _t2 _t3 _t4 _t5 _t6 type _t8 _t9 value _tX; do
case "$type" in
Old_age|Pre-fail ) ;; # only use these types
*) continue ;;
esac
case "$attribute" in # only collect certail metrics
5 ) metric="reallocated_sector-ct" ;;
9 ) metric="power_on-hours" ;;
12 ) metric="power_cycle-count" ;;
168 ) metric="sata_phy_error-count" ;;
170 ) metric="bad_block-count" ;;
190 ) metric="airflow_temperature-cel" ;;
194 ) metric="temperature-celsius" ;;
197 ) metric="current_pending-sector" ;;
198 ) metric="offline-uncorrectable" ;;
* ) continue ;;
esac
echo "PUTVAL $HOSTNAME/smart-${instance}/$metric interval=$INTERVAL N:$value"
done
done
sleep "$INTERVAL"
done
# collectd plugin config
#LoadPlugin exec
#<Plugin exec>
# Exec "collectd:disk" "/etc/collectd/collectd.conf.d/plugin_exec_smart.sh"
#</Plugin>
# sudoers
#Cmnd_Alias COLLECTD_PLUGIN_EXEC_SMART = /etc/collectd/collectd.conf.d/plugin_exec_smart.sh
#collectd ALL = (root) NOPASSWD: COLLECTD_PLUGIN_EXEC_SMART
# custom types
#raw_read_error rate:GAUGE:0:U
#spin_up time:GAUGE:0:U
#start_stop count:GAUGE:0:U
#reallocated_sector ct:GAUGE:0:U
#seek_error rate:GAUGE:0:U
#seek_time performance:GAUGE:0:U
#power_on hours:GAUGE:0:U
#spin_retry count:GAUGE:0:U
#calibration_retry count:GAUGE:0:U
#power_cycle count:GAUGE:0:U
#read_soft_error rate:GAUGE:0:U
#runtime_bad block:GAUGE:0:U
#end_to_end error:GAUGE:0:U
#reported uncorrect:GAUGE:0:U
#command timeout:GAUGE:0:U
#airflow_temperature cel:GAUGE:0:U
#temperature celsius:GAUGE:0:U
#hardware_ecc recovered:GAUGE:0:U
#reallocated_event count:GAUGE:0:
#current_pending sector:GAUGE:0:U
#current_pending sector:GAUGE:0:U
#offline uncorrectable:GAUGE:0:U
#udma_crc_error count:GAUGE:0:U
#multi_zone_error rate:GAUGE:0:U
#soft_read_error rate:GAUGE:0:U
#throughput performance:GAUGE:0:U
#g_sense_error rate:GAUGE:0:U
#power_off_retract count:GAUGE:0:U
#load_retry count:GAUGE:0:U
#load_cycle count:GAUGE:0:U
#sata_phy_error count:GAUGE:0:U
#bad_block count:GAUGE:0:U
#erase count:GAUGE:0:U
#bad_block count:GAUGE:0:U
#bad_cluster_table count:GAUGE:0:U
# smartctl Attributes
# smartctl -a | awk '$1 ~ "[0-9]+" && $7 ~ "Old_age|Pre-fail" {gsub("-","_", $2); print $1" "tolower($2)" "$10}'
# 1 raw_read_error_rate 176026840
# 3 spin_up_time 0
# 4 start_stop_count 25
# 5 reallocated_sector_ct 0
# 7 seek_error_rate 307256091
# 9 power_on_hours 17826
# 10 spin_retry_count 0
# 12 power_cycle_count 26
# 168 sata_phy_error-count 0
# 184 end_to_end_error 0
# 187 reported_uncorrect 0
# 188 command_timeout 0
# 189 high_fly_writes 175
# 190 airflow_temperature_cel 36
# 191 g_sense_error_rate 0
# 192 power_off_retract_count 21
# 193 load_cycle_count 25
# 194 temperature_celsius 36
# 197 current_pending_sector 0
# 198 offline_uncorrectable 0
# 199 udma_crc_error_count 0
#SSD
# 173 erase-count
# 170 bad_block-count
# 175 bad_cluster_table-count
@j0ju
Copy link
Author

j0ju commented Jul 24, 2016

The following attributes might intesting for SSDs:
Buildin in Rev 3.

 173 Erase_Count 
 170 Bad_Block_Count   
 168 SATA_Phy_Error_Count

TODO:

 175 Bad_Cluster_Table_Count  
 235 Later_Bad_Block
 236 Unstable_Power_Count
 240 Write_Head
 241 Total_LBAs_Written

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment