This is a step-by-step instruction to manually set up Cloud Monitoring through the server-side configuration feature to get familiar with the process. The target use case for this is for automation like Chef Cookbook, Ansible Playbook or scripts.
In this example, we are going to set up the filesystem check because it is always a good idea to get notified BEFORE you run out of space. We are going to use Ubuntu as the example OS to simplify the steps.
- Upgrade agent
- Create YAML file
- Copy file to the conf.d diretory
- Restart agent
Make sure that your agent is updated to the version that supports server-side monitoring configuration feature.
~$ rackspace-monitoring-agent --version
0.2.0-33
Update monitorig agent using apt-get:
sudo apt-get install rackspace-monitoring-agent
After you SSH into the server, create a YAML file filesystem.yaml
type: agent.filesystem
label: Filesystem on /
disabled: false
period: 60
timeout: 30
details:
target: /
alarms:
alarm-disk-size:
label: usage on /
notification_plan_id: npTechnicalContactsEmail
criteria: |
if (percentage(metric['used'], metric['total']) > 90) {
return new AlarmStatus(CRITICAL, 'Disk usage is above 90%, #{used} out of #{total}');
}
if (percentage(metric['used'], metric['total']) > 80) {
return new AlarmStatus(WARNING, 'Disk usage is above 80%, #{used} out of #{total}');
}
If this is the first time, you will need to create the agent conf.d directory
sudo mkdir /etc/rackspace-monitoring-agent.conf.d
Copy the files you just created to the conf.d directory:
sudo cp filesystem.yaml /etc/rackspace-monitoring-agent.conf.d
The monitoring agent reads the conf.d diretory every time it restarts.
sudo service rackspace-monitoring-agent restart
That's it! You should see your checks showing up in no time!
At the beginning, or when you need to gatehr more information from the agent, you can tail the log file for more information
sudo tail -f /var/log/rackspace-monitoring-agent.log
When all goes well, you will see lines in the log like the following.
Sun May 11 22:48:57 2014 INF: Confd -> config_file post overall success
Sun May 11 22:48:57 2014 INF: Confd -> config_file post operation result: success for file, handle: filesystem.yaml at parsing
Sun May 11 22:48:57 2014 INF: Confd -> config_file post operation result: success for check, handle: {"check":"default","filename":"filesystem.yaml"} at create
Sun May 11 22:48:57 2014 INF: Confd -> config_file post operation result: success for alarm, handle: {"alarm":"alarm-disk-size","filename":"filesystem.yaml"} at create
When there is an error in the YAML file, the agent will try to provide as much detailed informaiton as possible. The following is a list of error messages that you might see.
Tue May 13 02:03:56 2014 ERR: Confd -> config_file post operation result: failure for file, handle: filesystem.yaml at parsing, error {"message":"[object Object]","stack":"Error: [object Object]\n at ConfigState.parseFile
-- You have a syntax error in YAML file. Consider run it through a YAML lint tool (e.g. http://yamllint.com/) to figure out what exactly is wrong.
Sun May 11 22:41:45 2014 ERR: Confd -> config_file post operation result: failure for check, handle: {"check":"default","filename":"ping-us.yaml"} at create validation, error {"key":"monitoring_zones_poll","message":"monitoring_zones_poll may not be empty"}
-- You are missing parameter monitoring_zones_poll
in the file
Sun May 11 22:41:45 2014 ERR: Confd -> config_file post operation result: failure for alarm, handle: {"alarm":"packet-loss","filename":"ping-us.yaml"} at create validation, error {"message":"Not a string","key":"check_id","parentKeys":[]}
-- Chances are that you had an error in creating the check that this alarm is configured for.
Sun May 11 22:45:40 2014 ERR: Confd -> config_file post operation result: failure for check, handle: {"check":"default","filename":"ping-us.yaml"} at create, error {"stack":"Error: Object \"MonitoringZone\" with key \"dfw\" does not exist\n at Object.construct ...
-- The monitoring_zone_poll is a configured list. The valid vaules are in the format of mz, for example, mzdfw
Sun May 11 22:47:15 2014 ERR: Confd -> config_file post operation result: failure for alarm, handle: {"alarm":"packet-loss","filename":"ping-us.yaml"} at create validation, error {"message":"Object \"NotificationPlan\" with key \"pagerduty\" does not exist","key":"notification_plan_id","parentKeys":[]}
-- The notification plan needs to be referenced through its ID and NOT its label.
you can use http://yamllint.com/ to quickly test your YAML syntax before submit the files through agent.
The following is an example for Ping check. You can find more examples on https://github.com/virgo-agent-toolkit/rackspace-monitoring-agent/tree/master/examples/rackspace_monitoring_agent.conf.d
Ping check is another popular check to get a sense if the server is available. Please note that the target_alias
field can be different from each server. We are looking into the possibility to make the name more consistent down the road.
ping-us-zones.yaml
type: remote.ping
label: pingv4 from US zones
disabled: false
period: 60
timeout: 30
details:
conut: 5
monitoring_zones_poll:
- mzdfw
- mziad
- mzord
target_alias: public1_v4
alarms:
packet-loss:
label: Ping v4 packet loss
notification_plan_id: npabEPlbCc
criteria: |
:set consistencyLevel=ONE
if (metric['available'] < 80) {
return new AlarmStatus(CRITICAL, 'Packet loss is greater than 20%, availability at #{available}');
}
if (metric['available'] < 95) {
return new AlarmStatus(WARNING, 'Packet loss is greater than 5%, availability at #{available}');
}
return new AlarmStatus(OK, 'Packet loss is normal, availabitiy at #{available}');
Your feedback is highly appreciated! Do you like it? Anything surprising? What do you see as good use for this feature?
More specifically, we wonder what you think about the design for treating YAML as the source of the truth. This means that even though you can still use API, UI or CLI to modify the checks and alarms created through server-side monitoring configuration, the changes is temporary and will be overriden next time the agent restarts.