Skip to content

Instantly share code, notes, and snippets.

@mdoel
Created July 7, 2015 20:34
Show Gist options
  • Save mdoel/0d551fd013a85885f9f3 to your computer and use it in GitHub Desktop.
Save mdoel/0d551fd013a85885f9f3 to your computer and use it in GitHub Desktop.
Blog Draft
Load testing vs performance testing
A common mistake made in our industry is to conflate performance testing with
load testing. While both are important, neither is a substitute for the other.
The former helps assess whether a system is performant. Can it accomplish some
task in a short enough period of time to be effective? The latter is used to
understand how the performance changes as more and more demand is put on the
system. I came to appreciate the value of both while working for AOL back at
the time when 30 million people looked to that service as the way to get
online.
At the time, we tried to convert people from dial-up access to broadband while
simultaneously maintaining the billing relationship with the customer. This
strategy ultimately proved unsuccessful (people preferred to buy access
directly from the phone or cable company), but it was the best hope the company
had for maintaining a dominant position as an ISP.
My team was responsible for the qualification and provisioning systems in this
effort. Qualification was particularly challenging. In order to be eligible for
DSL, you had to live within a short distance of a telephone exchange that had a
Digital Subscriber Line Access Multiplexer (DSLAM) with open capacity. And just
to make it interesting, the workers in the local chapter of the Communications
Workers of America couldn't be on strike at the time.
The experience we had to implement was to background qualify users during the
log in flow, using the phone number we had on file. If the user was qualified,
they were prompted to upgrade to DSL.
Our qualification system had to perform well. Delays not only led to a less
effective conversion rate, but contributed to a degraded user experience as
well since users had been conditioned to expect these log in popups (remember
those?) to appear early rather than late. In addition, the system had to be
capable of handling sustantial load. At the peak time of day, several hundred
people per second would log in and begin the qualification process.
This entire experience taught me three fundamental lessons.
1. It's important to not confuse performance assessment with load testing. As
mentioned, the former is all about ensuring that software reacts to inputs in
an acceptably short period of time. The latter is about understanding how much
hardware is needed to serve expected demand and whether there are any parts of
the architecture that unacceptably limit this (e.g. a single shared database
that all requests queue up behind).
2. Synthetic load testing is at best a very poor proxy for understanding how
your system will perform under actual load. Real world usage has traffic
patterns that are far more complex and varied than you're likely to come up
with in a load testing script. This matters a great deal since it prevents code
and data from being cached in the same way as it can be in simple tests.
3. The demand on a system is rarely constant. Much more typical is a daily
traffic pattern where load is high at predictable times of day and lower at
others. The users still do the same basic stuff throughout the course of the
day, but there are time clusters where there are more of them doing it. It's
only at these times of highest system demand where having a good handle on your
capacity is important.
At any given stage, you need to be clear-headed about what it is you need to
understand about your application. For example, if you are concerned with how
the user experience suffers under conditions of low bandwidth or less than
modern hardware, you're dealing with a performance concern. In such cases,
tools that analyze individual transactions are sufficient. In Ruby, this
includes things like New Relic and the bullet gem to study database queries
that can be optimized. For web applications, Google PageSpeed Insights and
Yahoo's YSlow are invaluable.
If instead you're trying to understand whether the system can respond to
expected demand and scale, then load testing is what's called for. There are
tools that purport to help with this. Apache's JMeter is an example of this.
But after my time at AOL, I consider myself a skeptic of this brand of tools.
As mentioned above, there are too many differences between real and synthetic
load to completely trust the results of a test.
To complete the story of my AOL days, we learned how to do load testing right.
Once we realized that simple load tests were getting us nowhere close to the
confidence we needed to provision adequate hardware, our first reaction was to
capture actual production traffic (e.g. Apache server logs) and then build
tools to replay that against test hardware. This was better, but it was both
difficult/expensive to do and still lacked in key real world characteristics
that are just plain difficult to model.
We eventually hit upon the realization that we could use bullet point three to
our advantage. Who needs synthetic load testing when you can just use the real
thing and get better information? To do this, you have to have a system that is
instrumented well enough for you to know when it's stressing. Queue sizes are
measured. Failure/abandon rates can be read and interpreted. Hardware
performance on the server (e.g. CPU/RAM levels) are exposed.  Once you have
that in place, you start at a part of the day when actual real load is low and
you remove some of your production capacity (e.g. load balancer adjustment).
Then you watch your numbers as traffic grows over the course of the day. Once
your instrumentation tells you that user performance is starting to degrade to
unacceptable levels, you flip your load balancer configuration back so you are
again at full capacity. After this exercise, you have an understanding of
system performance under load that you can be much more confident in than
anything synthetic load testing - even good load derived from actual production
usage - can tell you.
No approach to performance or load testing is foolproof. You have to resign
yourself to getting battle scars that inform future system design and
implementation. But it all starts with an understanding of the differences
between performance and load testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment