Created
July 7, 2015 20:34
-
-
Save mdoel/0d551fd013a85885f9f3 to your computer and use it in GitHub Desktop.
Blog Draft
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Load testing vs performance testing | |
A common mistake made in our industry is to conflate performance testing with | |
load testing. While both are important, neither is a substitute for the other. | |
The former helps assess whether a system is performant. Can it accomplish some | |
task in a short enough period of time to be effective? The latter is used to | |
understand how the performance changes as more and more demand is put on the | |
system. I came to appreciate the value of both while working for AOL back at | |
the time when 30 million people looked to that service as the way to get | |
online. | |
At the time, we tried to convert people from dial-up access to broadband while | |
simultaneously maintaining the billing relationship with the customer. This | |
strategy ultimately proved unsuccessful (people preferred to buy access | |
directly from the phone or cable company), but it was the best hope the company | |
had for maintaining a dominant position as an ISP. | |
My team was responsible for the qualification and provisioning systems in this | |
effort. Qualification was particularly challenging. In order to be eligible for | |
DSL, you had to live within a short distance of a telephone exchange that had a | |
Digital Subscriber Line Access Multiplexer (DSLAM) with open capacity. And just | |
to make it interesting, the workers in the local chapter of the Communications | |
Workers of America couldn't be on strike at the time. | |
The experience we had to implement was to background qualify users during the | |
log in flow, using the phone number we had on file. If the user was qualified, | |
they were prompted to upgrade to DSL. | |
Our qualification system had to perform well. Delays not only led to a less | |
effective conversion rate, but contributed to a degraded user experience as | |
well since users had been conditioned to expect these log in popups (remember | |
those?) to appear early rather than late. In addition, the system had to be | |
capable of handling sustantial load. At the peak time of day, several hundred | |
people per second would log in and begin the qualification process. | |
This entire experience taught me three fundamental lessons. | |
1. It's important to not confuse performance assessment with load testing. As | |
mentioned, the former is all about ensuring that software reacts to inputs in | |
an acceptably short period of time. The latter is about understanding how much | |
hardware is needed to serve expected demand and whether there are any parts of | |
the architecture that unacceptably limit this (e.g. a single shared database | |
that all requests queue up behind). | |
2. Synthetic load testing is at best a very poor proxy for understanding how | |
your system will perform under actual load. Real world usage has traffic | |
patterns that are far more complex and varied than you're likely to come up | |
with in a load testing script. This matters a great deal since it prevents code | |
and data from being cached in the same way as it can be in simple tests. | |
3. The demand on a system is rarely constant. Much more typical is a daily | |
traffic pattern where load is high at predictable times of day and lower at | |
others. The users still do the same basic stuff throughout the course of the | |
day, but there are time clusters where there are more of them doing it. It's | |
only at these times of highest system demand where having a good handle on your | |
capacity is important. | |
At any given stage, you need to be clear-headed about what it is you need to | |
understand about your application. For example, if you are concerned with how | |
the user experience suffers under conditions of low bandwidth or less than | |
modern hardware, you're dealing with a performance concern. In such cases, | |
tools that analyze individual transactions are sufficient. In Ruby, this | |
includes things like New Relic and the bullet gem to study database queries | |
that can be optimized. For web applications, Google PageSpeed Insights and | |
Yahoo's YSlow are invaluable. | |
If instead you're trying to understand whether the system can respond to | |
expected demand and scale, then load testing is what's called for. There are | |
tools that purport to help with this. Apache's JMeter is an example of this. | |
But after my time at AOL, I consider myself a skeptic of this brand of tools. | |
As mentioned above, there are too many differences between real and synthetic | |
load to completely trust the results of a test. | |
To complete the story of my AOL days, we learned how to do load testing right. | |
Once we realized that simple load tests were getting us nowhere close to the | |
confidence we needed to provision adequate hardware, our first reaction was to | |
capture actual production traffic (e.g. Apache server logs) and then build | |
tools to replay that against test hardware. This was better, but it was both | |
difficult/expensive to do and still lacked in key real world characteristics | |
that are just plain difficult to model. | |
We eventually hit upon the realization that we could use bullet point three to | |
our advantage. Who needs synthetic load testing when you can just use the real | |
thing and get better information? To do this, you have to have a system that is | |
instrumented well enough for you to know when it's stressing. Queue sizes are | |
measured. Failure/abandon rates can be read and interpreted. Hardware | |
performance on the server (e.g. CPU/RAM levels) are exposed. Once you have | |
that in place, you start at a part of the day when actual real load is low and | |
you remove some of your production capacity (e.g. load balancer adjustment). | |
Then you watch your numbers as traffic grows over the course of the day. Once | |
your instrumentation tells you that user performance is starting to degrade to | |
unacceptable levels, you flip your load balancer configuration back so you are | |
again at full capacity. After this exercise, you have an understanding of | |
system performance under load that you can be much more confident in than | |
anything synthetic load testing - even good load derived from actual production | |
usage - can tell you. | |
No approach to performance or load testing is foolproof. You have to resign | |
yourself to getting battle scars that inform future system design and | |
implementation. But it all starts with an understanding of the differences | |
between performance and load testing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment