Rails Large Application Tuning

###Measuring performance

Monitor the values of specific code paths and graph them to see performance over time (response times as one example metric)
Automated tests that measure performance can fail based on a set threshold
- If a given code path exceeds 20% of the existing response time, then the automated test fails, alerting ops and devs that a recent code change has negatively impacted performance beyond an pre-defined SLA or threshold
Need a production like environment
Make that performance test environment exclusive to performance testing (don't let regular usage or QA usage affect the test results)
Using NewRelic to compare boxes against each other

###Performance Test Suite

Jmeter, gatling, OpenSTA, Tsung (speaker used Jmeter)
Recording functionality to maintain the tests
Validations on page access to avoid false results
Parameterize tests to use different data (via different users so cached queries don't throw off test results)
Tests can run as a distributed test suite to simulate actual users access
Tests run headless against the nightly build
Ideal: create an "ultimate" test suite based on production logs (replaying the production logs)
Biggest take home point: Use NewRelic

###Fixing Your Legacy Application

Akami (using ISP's as a CDN to cache images and common views)
Running Apache as server, running several instances of the application
Each process connects to Moxi as proxy layer for Memcached
Additionally used an in-process cache for frequently requested objects (like user data)

###Out of Band GC

Trigger GC only out of band (meaning not during a normal http request, or during the execution of application code)
Available in Passenger 4
Increase GC limit to skip GC during a single request
Fine tune the max number of passenger processes to handle concurrent requests
Using Ruby GC parameters to delay GC to trigger every 5th request rather than every request gave significant application performance
By using out of band GC, memory footprint will grow (reduced number of processes running on application server from 30 to 20)
Had to find a sweet spot of allowing many objects to be created in a request, but creating just enough objects so that Ruby's GC isn't triggered during the request, but immediately after the request then GC those objects (did this by delaying GC every 5 requests with a higher than normal GC_MALLOC_LIMIT)

	export RUBY_HEAP_MIN_SLOTS=3000000
	export RUBY_GC_MALLOC_LIMIT=120000000

###Fragment Caching

Caching of a data row in search results
Caching of user menu items
Caching of non-user specific display snippet (mostly static, drop down elements)
Force the caching key to include all changeable element ids (app_user_123 as the cache key where 123 is the user id)
May inject user specific data to improve caching usage
Use ajax calls for links that contain user specific data

###Optimize Active Record

###Caching Is Not Free

Have to create policies for how and when to expire caches
Get a little bit of automatic updating of cache when updating an AR object
Cache infrastructure (Memcached, Moxi proxy layer, etc) adds extra overhead to environment complexity, deployments, debugging, cost of infrastructure, etc.

###Deployments

Happened every three weeks
Cold deployments
Could theoretically use blue green deployments (one set of live production servers, one set of non live production servers)
With several servers (12 servers in total) - monitor 4 prod servers and their performance testing server

###My Questions

Why the use of Moxi proxy for talking with Memcached?
- 2 Memcached servers for two application servers both running about 20 processes each resulted in a lot of network traffic.
- This network traffic caused delays in retrieving things from Memcached, which introduced latency
- Moxi had it's own cache for specific commonly requested resources (like a list of the 50 states)
- Moxi also had policies for determining which Memcached server to hit (structuring the data in Memcached so that requests could be directed via policy through the Moxi proxy layer)
How did you determine when to cache things in process?
- Using NewRelic to determine what the most common transactions are, and identifying which of those common transactions were the most expensive
- Using a simple hash, the requested object would first be queried in an environment hash, if that results in a "cache miss", then the Moxy layer is queried for the object, and if the object doesn't exist in Memcached, the DB is queried
What kind of expiration policy did you use for your caches?
- Had a simple 5 minute expiration policy for everything in the fragment cache. Relied on ActiveRecord's built in cache support when updating AR objects.

rewinfrey/rails_application_tuning.md