Notes from Tuning Legacy Rails App: How to Make an Elephant Sprint
###Measuring performance
- Monitor the values of specific code paths and graph them to see performance over time (response times as one example metric)
- Automated tests that measure performance can fail based on a set threshold
- If a given code path exceeds 20% of the existing response time, then the automated test fails, alerting ops and devs that a recent code change has negatively impacted performance beyond an pre-defined SLA or threshold
- Need a production like environment
- Make that performance test environment exclusive to performance testing (don't let regular usage or QA usage affect the test results)
- Using NewRelic to compare boxes against each other
###Performance Test Suite
- Jmeter, gatling, OpenSTA, Tsung (speaker used Jmeter)
- Recording functionality to maintain the tests
- Validations on page access to avoid false results
- Parameterize tests to use different data (via different users so cached queries don't throw off test results)
- Tests can run as a distributed test suite to simulate actual users access
- Tests run headless against the nightly build
- Ideal: create an "ultimate" test suite based on production logs (replaying the production logs)
- Biggest take home point: Use NewRelic
###Fixing Your Legacy Application
- Akami (using ISP's as a CDN to cache images and common views)
- Running Apache as server, running several instances of the application
- Each process connects to Moxi as proxy layer for Memcached
- Additionally used an in-process cache for frequently requested objects (like user data)
###Out of Band GC
- Trigger GC only out of band (meaning not during a normal http request, or during the execution of application code)
- Available in Passenger 4
- Increase GC limit to skip GC during a single request
- Fine tune the max number of passenger processes to handle concurrent requests
- Using Ruby GC parameters to delay GC to trigger every 5th request rather than every request gave significant application performance
- By using out of band GC, memory footprint will grow (reduced number of processes running on application server from 30 to 20)
- Had to find a sweet spot of allowing many objects to be created in a request, but creating just enough objects so that Ruby's GC isn't triggered during the request, but immediately after the request then GC those objects (did this by delaying GC every 5 requests with a higher than normal GC_MALLOC_LIMIT)
export RUBY_HEAP_MIN_SLOTS=3000000
export RUBY_GC_MALLOC_LIMIT=120000000
###Fragment Caching
- Caching of a data row in search results
- Caching of user menu items
- Caching of non-user specific display snippet (mostly static, drop down elements)
- Force the caching key to include all changeable element ids (app_user_123 as the cache key where 123 is the user id)
- May inject user specific data to improve caching usage
- Use ajax calls for links that contain user specific data
###Optimize Active Record
- In process caching of frequently used objects
- Memcached (via Moxi proxy)
- Preloading of associations (:includes)
- Use id based queries rather than object based queries
- Use raw SQL to optimize queries
###Caching Is Not Free
- Have to create policies for how and when to expire caches
- Get a little bit of automatic updating of cache when updating an AR object
- Cache infrastructure (Memcached, Moxi proxy layer, etc) adds extra overhead to environment complexity, deployments, debugging, cost of infrastructure, etc.
###Deployments
- Happened every three weeks
- Cold deployments
- Could theoretically use blue green deployments (one set of live production servers, one set of non live production servers)
- With several servers (12 servers in total) - monitor 4 prod servers and their performance testing server
###My Questions
-
Why the use of Moxi proxy for talking with Memcached?
- 2 Memcached servers for two application servers both running about 20 processes each resulted in a lot of network traffic.
- This network traffic caused delays in retrieving things from Memcached, which introduced latency
- Moxi had it's own cache for specific commonly requested resources (like a list of the 50 states)
- Moxi also had policies for determining which Memcached server to hit (structuring the data in Memcached so that requests could be directed via policy through the Moxi proxy layer)
-
How did you determine when to cache things in process?
- Using NewRelic to determine what the most common transactions are, and identifying which of those common transactions were the most expensive
- Using a simple hash, the requested object would first be queried in an environment hash, if that results in a "cache miss", then the Moxy layer is queried for the object, and if the object doesn't exist in Memcached, the DB is queried
-
What kind of expiration policy did you use for your caches?
- Had a simple 5 minute expiration policy for everything in the fragment cache. Relied on ActiveRecord's built in cache support when updating AR objects.