Skip to content

Instantly share code, notes, and snippets.

@lukebakken
Last active November 1, 2020 19:50
Show Gist options
  • Save lukebakken/9486443 to your computer and use it in GitHub Desktop.
Save lukebakken/9486443 to your computer and use it in GitHub Desktop.
Riak monitoring setup with collectd + Graphite
Hostname "ubuntu-12"
FQDNLookup true
BaseDir "/var/lib/collectd"
PIDFile "/var/run/collectd.pid"
PluginDir "/usr/local/lib/collectd"
TypesDB "/usr/local/share/collectd/types.db"
# LoadPlugin syslog
# <Plugin syslog>
# LogLevel info
# # LogLevel debug
# </Plugin>
LoadPlugin logfile
<Plugin logfile>
# LogLevel debug
# File STDOUT
LogLevel info
File "/var/log/collectd.log"
Timestamp true
PrintSeverity false
</Plugin>
LoadPlugin cpu
LoadPlugin curl_json
LoadPlugin filecount
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin network
LoadPlugin uptime
LoadPlugin write_graphite
<Plugin curl_json>
<URL "http://localhost:8098/stats">
Instance "[email protected]"
<Key "memory_total">
Type "bytes"
</Key>
<Key "memory_processes">
Type "bytes"
</Key>
<Key "memory_system">
Type "bytes"
</Key>
<Key "memory_code">
Type "bytes"
</Key>
<Key "memory_ets">
Type "bytes"
</Key>
<Key "cpu_avg1">
Type "gauge"
</Key>
<Key "cpu_avg5">
Type "gauge"
</Key>
<Key "cpu_avg15">
Type "gauge"
</Key>
<Key "pbc_active">
Type "gauge"
</Key>
<Key "pbc_connects_total">
# NOTE: "counter" vs "gauge"
# A counter type is used for always-increasing values where the
# rate of change (derivative) is more important than the
# increasing value itself. Using a gauge type here will just show
# an increasing line, while counter will show spikes if the
# pbc_connects_total goes up quickly.
Type "counter"
</Key>
<Key "node_gets">
Type "gauge"
</Key>
<Key "node_gets_total">
Type "counter"
</Key>
<Key "node_get_fsm_time_mean">
Type "gauge"
</Key>
<Key "node_get_fsm_time_median">
Type "gauge"
</Key>
<Key "node_get_fsm_time_95">
Type "gauge"
</Key>
<Key "node_get_fsm_time_99">
Type "gauge"
</Key>
<Key "node_get_fsm_time_100">
Type "gauge"
</Key>
<Key "node_get_fsm_objsize_mean">
Type "gauge"
</Key>
<Key "node_get_fsm_objsize_median">
Type "gauge"
</Key>
<Key "node_get_fsm_objsize_95">
Type "gauge"
</Key>
<Key "node_get_fsm_objsize_99">
Type "gauge"
</Key>
<Key "node_get_fsm_objsize_100">
Type "gauge"
</Key>
<Key "node_get_fsm_siblings_mean">
Type "gauge"
</Key>
<Key "node_get_fsm_siblings_median">
Type "gauge"
</Key>
<Key "node_get_fsm_siblings_95">
Type "gauge"
</Key>
<Key "node_get_fsm_siblings_99">
Type "gauge"
</Key>
<Key "node_get_fsm_siblings_100">
Type "gauge"
</Key>
<Key "node_puts">
Type "gauge"
</Key>
<Key "node_puts_total">
Type "counter"
</Key>
<Key "node_put_fsm_time_mean">
Type "gauge"
</Key>
<Key "node_put_fsm_time_median">
Type "gauge"
</Key>
<Key "node_put_fsm_time_95">
Type "gauge"
</Key>
<Key "node_put_fsm_time_99">
Type "gauge"
</Key>
<Key "node_put_fsm_time_100">
Type "gauge"
</Key>
<Key "vnode_gets">
Type "gauge"
</Key>
<Key "vnode_gets_total">
Type "counter"
</Key>
<Key "vnode_puts">
Type "gauge"
</Key>
<Key "vnode_puts_total">
Type "counter"
</Key>
<Key "vnode_index_reads">
Type "gauge"
</Key>
<Key "vnode_index_writes">
Type "gauge"
</Key>
</URL>
</Plugin>
<Plugin filecount>
<Directory "/var/lib/riak/bitcask">
Instance "[email protected]"
Name "*.data"
Recursive true
IncludeHidden false
</Directory>
</Plugin>
<Plugin write_graphite>
<Carbon>
Host "10.0.3.3"
Port "2003"
Protocol "tcp"
LogSendErrors true
# Prefix "collectd"
Postfix "-collectd"
StoreRates true
AlwaysAppendDS false
EscapeCharacter "_"
</Carbon>
</Plugin>

Riak Monitoring Notes

Build collectd

http://collectd.org

Compilation of 5.4.1 on Ubuntu 12 LTS, enabling curl_json plugin support (among others). This will use /usr/local as the install PREFIX, with state files in /var

$ sudo apt-get install libcurl4-openssl-dev liboping-dev libyajl-dev \
iproute-dev libmnl-dev build-essential
$ tar -xf collectd-5.4.1.tar.bz2
$ cd collectd-5.4.1
$ ./configure --prefix=/usr/local --localstatedir=/var \
--enable-curl_json --enable-cpu --enable-df --enable-disk \
--enable-ethstat --enable-load --enable-memory --enable-netlink \
--enable-numa --enable-ping --enable-processes --enable-protocols \
--enable-swap --enable-tcpconns --enable-uptime --enable-users \
--enable-write_graphite --enable-write_http --enable-filecount \
--enable-debug
$ make
$ sudo make install

To create a binary distribution to copy to other nodes:

$ sudo make install DESTDIR=/tmp/collectd-5.4.1
$ sudo tar -C /tmp/collectd-5.4.1 -czvf /tmp/collectd-5.4.1-bin.tgz .

Installing on other node:

$ sudo tar -C / -xf collectd-5.4.1-bin.tgz

Configure collectd

http://www.the-eleven.com/tlegg/blog/2012/05/28/monitoring-riak-collectd-5/

Edit collectd.conf (/usr/local/etc/collectd.conf in this example). Complete file is included in this gist.

Install Graphite

http://graphite.readthedocs.org/en/latest/install.html

https://www.digitalocean.com/community/articles/installing-and-configuring-graphite-and-statsd-on-an-ubuntu-12-04-vps

apt-get update
apt-get upgrade

apt-get install git-core python-dev python-pip python-cairo-dev \
memcached apache2 apache2-mpm-worker apache2-utils apache2.2-bin \
apache2.2-common libapr1 libaprutil1 libaprutil1-dbd-sqlite3 \
build-essential libapache2-mod-wsgi curl

mkdir graphite
cd graphite
git clone -q git://github.com/graphite-project/graphite-web.git &
git clone -q git://github.com/graphite-project/carbon.git &
git clone -q git://github.com/graphite-project/whisper.git &
git clone -q git://github.com/graphite-project/ceres.git &
(cd ceres && python setup.py install)
(cd whisper && python setup.py install)
(cd carbon && python setup.py install)
pip install django pytz pyparsing django-tagging zope.interface \
    python-memcached twisted
(cd graphite-web && python setup.py install)

cd /opt/graphite/conf

cp carbon.conf.example carbon.conf

cat > storage-schemas.conf <<'EOT'
[stats]
priority = 110
pattern = .*
retentions = 10:2160,60:10080,600:262974
EOT

cd /opt/graphite/webapp/graphite/
cp local_settings.py.example local_settings.py

SECRET_KEY=$(dd if=/dev/urandom bs=64 count=1 2>/dev/null | openssl enc -base64 | tr -d '\n')
set -o noglob
sed -i.bak -e"s|^SECRET_KEY.*$|SECRET_KEY = '$SECRET_KEY'|" /local_settings.py
set +o noglob
unset SECRET_KEY

# This sets up the default SQLite database.
django-admin.py syncdb --settings=graphite.settings --pythonpath=/opt/graphite/webapp

# This makes Graphite the default website for Apache
cp /opt/graphite/examples/example-graphite-vhost.conf /etc/apache2/sites-available/default

cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi

# Apache runs as the www-data user and must have access to the DB directory.
chown -R www-data:www-data /opt/graphite/storage

mkdir -p /etc/httpd/wsgi

sed -i.bak -e'/^WSGISocketPrefix/s/.*/WSGISocketPrefix \/etc\/httpd\/wsgi/' /etc/apache2/sites-available/default

mkdir /opt/graphite/storage/log/carbon-cache
/opt/graphite/bin/carbon-cache.py start

# This will set up carbon-cache.py to run via Upstart
curl -so /etc/init/carbon-cache.conf \
    https://raw.github.com/captnswing/chef-graphite/master/templates/default/carbon-cache-upstart.erb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment