JK.Ryan jkryanchou

Moved

Now located at https://github.com/JeffPaine/beautiful_idiomatic_python.

Why it was moved

Github gists don't support Pull Requests or any notifications, which made it impossible for me to maintain this (surprisingly popular) gist with fixes, respond to comments and so on. In the interest of maintaining the quality of this resource for others, I've moved it to a proper repo. Cheers!

Performance of Flask, Tornado, GEvent, and their combinations

Wensheng Wang, 10/1/11

Source: http://blog.wensheng.org/2011/10/performance-of-flask-tornado-gevent-and.html

When choosing a web framework, I pretty much have eyes set on Tornado. But I heard good things about Flask and Gevent. So I tested the performance of each and combinations of the three. I chose something just a little more advanced than a "Hello World" program to write - one that use templates. Here are the codes:

1, Pure Flask (pure_flask.py)

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination

Next Steps

Measure time spend on index, flush, refresh, merge, query, etc. (TD - done)
Take hot threads snapshots under read+write, read-only, write-only (TD - done)
Adjust refresh time to 10s (from 1s) and see how load changes (TD)
Measure time of a rolling restart doing disable_flush and disable_recovery (TD)
Specify routing on query -- make it choose same node for each shard each time (MD)
GC new generation size (TD)
Warmers

measure before/after of client query time with and without warmers (MD)

(This gist is pretty old; I've written up my current approach to the Pyramid integration on this blog post, but that blog post doesn't go into the transactional management, so you may still find this useful.)

Fast and flexible unit tests with live Postgres databases and fixtures

I've created a Pyramid scaffold which integrates Alembic, a migration tool, with the standard SQLAlchemy scaffold. (It also configures the Mako template system, because I prefer Mako.)

I am also using PostgreSQL for my database. PostgreSQL supports nested transactions. This means I can setup the tables at the beginning of the test session, then start a transaction before each test happens and roll it back after the test; in turn, this means my tests operate in the same environment I expect to use in production, but they are also fast.

I based my approach on [sontek's blog post](http://sontek.net/blog/

Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.

To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.

Our setup:

elasticsearch

We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.

Locate the section for your github remote in the .git/config file. It looks like this:

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = [email protected]:joyent/node.git

Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/* to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:

	import os
	import inspect
	from functools import wraps


	_file_cache = {}


	def cache_file(path):
	def cache_is_fresh(name):

	##################################################################
	# /etc/elasticsearch/elasticsearch.yml
	#
	# Base configuration for a write heavy cluster
	#

	# Cluster / Node Basics
	cluster.name: logng

	# Node can have abritrary attributes we can use for routing

	If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

	- Use create in the index API (assuming you can).
	- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
	- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
	- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
	- Increase the memory allocated to elasticsearch node. By default its 1g.
	- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
	- Increase the number of machines you have so