Skip to content

Instantly share code, notes, and snippets.

@bjcubsfan
Last active April 7, 2017 19:23
Show Gist options
  • Save bjcubsfan/951d7be04674a6665d6297770608c19e to your computer and use it in GitHub Desktop.
Save bjcubsfan/951d7be04674a6665d6297770608c19e to your computer and use it in GitHub Desktop.
Configure a server to email on hard drive problems

Set up email for notices

I installed ssmtp and configured it:

/etc/ssmtp/ssmtp.conf
----
#
# /etc/ssmtp/ssmtp.conf -- a config file for sSMTP sendmail.
#
# The person who gets all mail for userids < 1000
# Make this empty to disable rewriting.
root=USER_NAME
# The place where the mail goes. The actual machine name is required
# no MX records are consulted. Commonly mailhosts are named mail.domain.com
# The example will fit if you are in domain.com and you mailhub is so named.
mailhub=mailhost.amc.faa.gov
# Where will the mail seem to come from?
#rewriteDomain=y
# The full hostname
hostname=root.amc.faa.gov

I tested it and it works:

20:04:18 USER_NAME@resiliency ssmtp echo "Sample email body" | sudo mail -v -s "Sample subject" [email protected]
[<-] 220 mailhost.amc.faa.gov ESMTP Postfix
[->] HELO resiliency.faa.gov
[<-] 250 mailhost.amc.faa.gov
[->] MAIL FROM:<[email protected]>
[<-] 250 2.1.0 Ok
[->] RCPT TO:<[email protected]>
[<-] 250 2.1.5 Ok
[->] DATA
[<-] 354 End data with <CR><LF>.<CR><LF>
[->] Received: by resiliency.faa.gov (sSMTP sendmail emulation); Wed, 29 Mar 2017 20:04:53 +0000
[->] From: "root" <[email protected]>
[->] Date: Wed, 29 Mar 2017 20:04:53 +0000
[->] To: [email protected]
[->] Subject: Sample subject
[->] User-Agent: mail v14.8.16
[->] 
[->] Sample email body
[->] .
[<-] 250 2.0.0 Ok: queued as 0AB8CA0A51
[->] QUIT
[<-] 221 2.0.0 Bye

Set up SMART disk monitoring

I installed smartmontools. I need to think about how to monitor the disks behind the RAID. Here's my raid:

17:39:27 USER_NAME@resiliency /etc sudo lspci | egrep -i 'raid'
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)

It seems that I can already see behind the RAID controller:

17:46:56 USER_NAME@resiliency backups sudo smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device

Turn on SMART monitoring for each drive (0-6).

17:54:10 USER_NAME@resiliency /etc sudo smartctl -s on -S on -d megaraid,6 /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.9.11-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Attribute Autosave Enabled.

17:54:18 USER_NAME@resiliency /etc 

Start a short SMART test for each drive (0-6)

17:58:02 USER_NAME@resiliency /etc sudo smartctl -t short -d megaraid,6 /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.9.11-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Wed Mar 29 18:00:05 2017

Check the output for all (0-6).

18:01:28 USER_NAME@resiliency /etc sudo smartctl -l selftest -d megaraid,6 /dev/sdb 
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.9.11-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      6063         -

All pass without error. Next, check the overall health for 0-6.

18:01:30 USER_NAME@resiliency /etc sudo smartctl -H -d megaraid,6 /dev/sdb          
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.9.11-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

I guess that's OK. It says to test all of them, but I'm don't think the info is currently accurate.

Disks are:

18:07:36 USER_NAME@resiliency /etc sudo smartctl -a -d megaraid,0 /dev/sdb | egrep -i "capacity|rotation"                                                                                  4 ↵
User Capacity:    200,049,647,616 bytes [200 GB]
Rotation Rate:    Solid State Device
18:08:39 USER_NAME@resiliency /etc sudo smartctl -a -d megaraid,1 /dev/sdb | egrep -i "capacity|rotation"
User Capacity:    200,049,647,616 bytes [200 GB]
Rotation Rate:    Solid State Device
18:08:44 USER_NAME@resiliency /etc sudo smartctl -a -d megaraid,2 /dev/sdb | egrep -i "capacity|rotation"
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Rotation Rate:    7200 rpm
18:08:48 USER_NAME@resiliency /etc sudo smartctl -a -d megaraid,3 /dev/sdb | egrep -i "capacity|rotation"
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Rotation Rate:    5400 rpm
18:08:51 USER_NAME@resiliency /etc sudo smartctl -a -d megaraid,4 /dev/sdb | egrep -i "capacity|rotation"
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Rotation Rate:    5400 rpm
18:08:59 USER_NAME@resiliency /etc sudo smartctl -a -d megaraid,5 /dev/sdb | egrep -i "capacity|rotation"
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Rotation Rate:    5400 rpm
18:09:03 USER_NAME@resiliency /etc sudo smartctl -a -d megaraid,6 /dev/sdb | egrep -i "capacity|rotation"
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Rotation Rate:    5400 rpm

Edit /etc/smard.conf and turn on the smartd

sudo systemctl enable smartd.service
sudo systemctl start smartd.service

I configured similar to the full example:

/etc/smartd.conf
----------------
DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././08|L/../../7/08) -W 4,35,40 -m [email protected] -M test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment