Tim Bunce timbunce

make statsd better for high throughput

In the interest of shipping high volumes of updates in the shortest amount of time and work, I'm proposing two statsd changes:

enhance the timer update format
add a zeromq input (in addition to UDP)

Original UDP protocol

Locate the section for your github remote in the .git/config file. It looks like this:

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = [email protected]:joyent/node.git

Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/* to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:

logstash queries graphed with graphite.

Operation: Decouple whisper from graphite.

Method: Create a graphite function that does a date histogram facet query against elasticsearch for a given query string for the time period viewed in the current graph.

Reason: graphite has some awesome math functions. Wouldn't it be cool if we could use those on logstash results?

The screenshot below is using logstash to watch the twitter stream of keywords "iphone" "apple" and "samsung" - then I graph them each, so we get an idea of popularity. As a bonus, I also do a movingAverage() on the iphone curve to show you why this is awesome.

Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.

To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.

Our setup:

elasticsearch

We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.

OS X Screencast to animated GIF

This gist shows how to create a GIF screencast using only free OS X tools: QuickTime, ffmpeg, and gifsicle.

Instructions

To capture the video (filesize: 19MB), using the free "QuickTime Player" application:

a simple git branching model (written in 2013)

This is a very simple git workflow. It (and variants) is in use by many people. I settled on it after using it very effectively at Athena. GitHub does something similar; Zach Holman mentioned it in this talk.

Update: Woah, thanks for all the attention. Didn't expect this simple rant to get popular.

	use ElasticSearch;

	my $es = ElasticSearch->new( servers => 'api.metacpan.org', no_refresh => 1 );

	my $scroller = $es->scrolled_search(
	query => { match_all => {} },
	search_type => 'scan',
	scroll => '5m',
	index => 'v0',
	type => 'release',

	#
	# Working with branches
	#

	# Get the current branch name (not so useful in itself, but used in
	# other aliases)
	branch-name = "!git rev-parse --abbrev-ref HEAD"
	# Push the current branch to the remote "origin", and set it to track
	# the upstream branch
	publish = "!git push -u origin $(git branch-name)"

	user www-data;
	worker_processes 1;
	pid /var/run/nginx.pid;

	events {
	worker_connections 1024;
	}

	http {
	sendfile on;

	The regex patterns in this gist are intended only to match web URLs -- http,
	https, and naked domains like "example.com". For a pattern that attempts to
	match all URLs, regardless of protocol, see: https://gist.github.com/gruber/249502


	# Single-line version:

	(?i)\b((?:https?:(?:/{1,3}\|[a-z0-9%])\|[a-z0-9.\-]+[.](?:com\|net\|org\|edu\|gov\|mil\|aero\|asia\|biz\|cat\|coop\|info\|int\|jobs\|mobi\|museum\|name\|post\|pro\|tel\|travel\|xxx\|ac\|ad\|ae\|af\|ag\|ai\|al\|am\|an\|ao\|aq\|ar\|as\|at\|au\|aw\|ax\|az\|ba\|bb\|bd\|be\|bf\|bg\|bh\|bi\|bj\|bm\|bn\|bo\|br\|bs\|bt\|bv\|bw\|by\|bz\|ca\|cc\|cd\|cf\|cg\|ch\|ci\|ck\|cl\|cm\|cn\|co\|cr\|cs\|cu\|cv\|cx\|cy\|cz\|dd\|de\|dj\|dk\|dm\|do\|dz\|ec\|ee\|eg\|eh\|er\|es\|et\|eu\|fi\|fj\|fk\|fm\|fo\|fr\|ga\|gb\|gd\|ge\|gf\|gg\|gh\|gi\|gl\|gm\|gn\|gp\|gq\|gr\|gs\|gt\|gu\|gw\|gy\|hk\|hm\|hn\|hr\|ht\|hu\|id\|ie\|il\|im\|in\|io\|iq\|ir\|is\|it\|je\|jm\|jo\|jp\|ke\|kg\|kh\|ki\|km\|kn\|kp\|kr\|kw\|ky\|kz\|la\|lb\|lc\|li\|lk\|lr\|ls\|lt\|lu\|lv\|ly\|ma\|mc\|md\|me\|mg\|mh\|mk\|ml\|mm\|mn\|mo\|mp\|mq\|mr\|ms\|mt\|mu\|mv\|mw\|mx\|my\|mz\|na\|nc\|ne\|nf\|ng\|ni\|nl\|no\|np\|nr\|nu\|nz\|om\|pa\|pe\|pf\|pg\|ph\|pk\|pl\|pm\|pn\|pr\|ps\|pt\|pw\|py\|qa\|re\|ro\|rs\|ru\|rw\|sa\|sb\|sc\|sd\|se\|sg\|sh\|si\|s