In the interest of shipping high volumes of updates in the shortest amount of time and work, I'm proposing two statsd changes:
- enhance the timer update format
- add a zeromq input (in addition to UDP)
use ElasticSearch; | |
my $es = ElasticSearch->new( servers => 'api.metacpan.org', no_refresh => 1 ); | |
my $scroller = $es->scrolled_search( | |
query => { match_all => {} }, | |
search_type => 'scan', | |
scroll => '5m', | |
index => 'v0', | |
type => 'release', |
Locate the section for your github remote in the .git/config
file. It looks like this:
[remote "origin"]
fetch = +refs/heads/*:refs/remotes/origin/*
url = [email protected]:joyent/node.git
Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/*
to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:
Operation: Decouple whisper from graphite.
Method: Create a graphite function that does a date histogram facet query against elasticsearch for a given query string for the time period viewed in the current graph.
Reason: graphite has some awesome math functions. Wouldn't it be cool if we could use those on logstash results?
The screenshot below is using logstash to watch the twitter stream of keywords "iphone" "apple" and "samsung" - then I graph them each, so we get an idea of popularity. As a bonus, I also do a movingAverage() on the iphone curve to show you why this is awesome.
Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.
To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.
We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.
# | |
# Working with branches | |
# | |
# Get the current branch name (not so useful in itself, but used in | |
# other aliases) | |
branch-name = "!git rev-parse --abbrev-ref HEAD" | |
# Push the current branch to the remote "origin", and set it to track | |
# the upstream branch | |
publish = "!git push -u origin $(git branch-name)" |
user www-data; | |
worker_processes 1; | |
pid /var/run/nginx.pid; | |
events { | |
worker_connections 1024; | |
} | |
http { | |
sendfile on; |
The regex patterns in this gist are intended only to match web URLs -- http, | |
https, and naked domains like "example.com". For a pattern that attempts to | |
match all URLs, regardless of protocol, see: https://gist.github.com/gruber/249502 | |
# Single-line version: | |
(?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|s |