mdoel · July 7, 2015 20:34
diff --git a/gistfile1.txt b/gistfile1.txt
 Load testing vs performance testing


 A common mistake made in our industry is to conflate performance testing with
 load testing. While both are important, neither is a substitute for the other.
 The former helps assess whether a system is performant. Can it accomplish some
 task in a short enough period of time to be effective? The latter is used to
 understand how the performance changes as more and more demand is put on the
 system.  I came to appreciate the value of both while working for AOL back at
 the time when 30 million people looked to that service as the way to get
 online.

 At the time, we tried to convert people from dial-up access to broadband while
 simultaneously maintaining the billing relationship with the customer. This
 strategy ultimately proved unsuccessful (people preferred to buy access
 directly from the phone or cable company), but it was the best hope the company
 had for maintaining a dominant position as an ISP.

 My team was responsible for the qualification and provisioning systems in this
 effort. Qualification was particularly challenging. In order to be eligible for
 DSL, you had to live within a short distance of a telephone exchange that had a
 Digital Subscriber Line Access Multiplexer (DSLAM) with open capacity. And just
 to make it interesting, the workers in the local chapter of the Communications
 Workers of America couldn't be on strike at the time.

 The experience we had to implement was to background qualify users during the
 log in flow, using the phone number we had on file. If the user was qualified,
 they were prompted to upgrade to DSL.

 Our qualification system had to perform well. Delays not only led to a less
 effective conversion rate, but contributed to a degraded user experience as
 well since users had been conditioned to expect these log in popups (remember
 those?) to appear early rather than late. In addition, the system had to be
 capable of handling sustantial load. At the peak time of day, several hundred
 people per second would log in and begin the qualification process.

 This entire experience taught me three fundamental lessons.

 1. It's important to not confuse performance assessment with load testing. As
 mentioned, the former is all about ensuring that software reacts to inputs in
 an acceptably short period of time. The latter is about understanding how much
 hardware is needed to serve expected demand and whether there are any parts of
 the architecture that unacceptably limit this (e.g. a single shared database
 that all requests queue up behind).

 2. Synthetic load testing is at best a very poor proxy for understanding how
 your system will perform under actual load. Real world usage has traffic
 patterns that are far more complex and varied than you're likely to come up
 with in a load testing script. This matters a great deal since it prevents code
 and data from being cached in the same way as it can be in simple tests.

 3. The demand on a system is rarely constant. Much more typical is a daily
 traffic pattern where load is high at predictable times of day and lower at
 others. The users still do the same basic stuff throughout the course of the
 day, but there are time clusters where there are more of them doing it. It's
 only at these times of highest system demand where having a good handle on your
 capacity is important.

 At any given stage, you need to be clear-headed about what it is you need to
 understand about your application.  For example, if you are concerned with how
 the user experience suffers under conditions of low bandwidth or less than
 modern hardware, you're dealing with a performance concern. In such cases,
 tools that analyze individual transactions are sufficient. In Ruby, this
 includes things like New Relic and the bullet gem to study database queries
 that can be optimized.  For web applications, Google PageSpeed Insights and
 Yahoo's YSlow are invaluable.

 If instead you're trying to understand whether the system can respond to
 expected demand and scale, then load testing is what's called for. There are
 tools that purport to help with this. Apache's JMeter is an example of this.
 But after my time at AOL, I consider myself a skeptic of this brand of tools.
 As mentioned above, there are too many differences between real and synthetic
 load to completely trust the results of a test.

 To complete the story of my AOL days, we learned how to do load testing right.
 Once we realized that simple load tests were getting us nowhere close to the
 confidence we needed to provision adequate hardware, our first reaction was to
 capture actual production traffic (e.g. Apache server logs) and then build
 tools to replay that against test hardware. This was better, but it was both
 difficult/expensive to do and still lacked in key real world characteristics
 that are just plain difficult to model.

 We eventually hit upon the realization that we could use bullet point three to
 our advantage. Who needs synthetic load testing when you can just use the real
 thing and get better information? To do this, you have to have a system that is
 instrumented well enough for you to know when it's stressing. Queue sizes are
 measured. Failure/abandon rates can be read and interpreted. Hardware
 performance on the server (e.g. CPU/RAM levels) are exposed.  Once you have
 that in place, you start at a part of the day when actual real load is low and
 you remove some of your production capacity (e.g. load balancer adjustment).
 Then you watch your numbers as traffic grows over the course of the day. Once
 your instrumentation tells you that user performance is starting to degrade to
 unacceptable levels, you flip your load balancer configuration back so you are
 again at full capacity. After this exercise, you have an understanding of
 system performance under load that you can be much more confident in than
 anything synthetic load testing - even good load derived from actual production
 usage - can tell you.

 No approach to performance or load testing is foolproof. You have to resign
 yourself to getting battle scars that inform future system design and
 implementation. But it all starts with an understanding of the differences
 between performance and load testing.
	Load testing vs performance testing


	A common mistake made in our industry is to conflate performance testing with
	load testing. While both are important, neither is a substitute for the other.
	The former helps assess whether a system is performant. Can it accomplish some
	task in a short enough period of time to be effective? The latter is used to
	understand how the performance changes as more and more demand is put on the
	system. I came to appreciate the value of both while working for AOL back at
	the time when 30 million people looked to that service as the way to get
	online.

	At the time, we tried to convert people from dial-up access to broadband while
	simultaneously maintaining the billing relationship with the customer. This
	strategy ultimately proved unsuccessful (people preferred to buy access
	directly from the phone or cable company), but it was the best hope the company
	had for maintaining a dominant position as an ISP.

	My team was responsible for the qualification and provisioning systems in this
	effort. Qualification was particularly challenging. In order to be eligible for
	DSL, you had to live within a short distance of a telephone exchange that had a
	Digital Subscriber Line Access Multiplexer (DSLAM) with open capacity. And just
	to make it interesting, the workers in the local chapter of the Communications
	Workers of America couldn't be on strike at the time.

	The experience we had to implement was to background qualify users during the
	log in flow, using the phone number we had on file. If the user was qualified,
	they were prompted to upgrade to DSL.

	Our qualification system had to perform well. Delays not only led to a less
	effective conversion rate, but contributed to a degraded user experience as
	well since users had been conditioned to expect these log in popups (remember
	those?) to appear early rather than late. In addition, the system had to be
	capable of handling sustantial load. At the peak time of day, several hundred
	people per second would log in and begin the qualification process.

	This entire experience taught me three fundamental lessons.

	1. It's important to not confuse performance assessment with load testing. As
	mentioned, the former is all about ensuring that software reacts to inputs in
	an acceptably short period of time. The latter is about understanding how much
	hardware is needed to serve expected demand and whether there are any parts of
	the architecture that unacceptably limit this (e.g. a single shared database
	that all requests queue up behind).

	2. Synthetic load testing is at best a very poor proxy for understanding how
	your system will perform under actual load. Real world usage has traffic
	patterns that are far more complex and varied than you're likely to come up
	with in a load testing script. This matters a great deal since it prevents code
	and data from being cached in the same way as it can be in simple tests.

	3. The demand on a system is rarely constant. Much more typical is a daily
	traffic pattern where load is high at predictable times of day and lower at
	others. The users still do the same basic stuff throughout the course of the
	day, but there are time clusters where there are more of them doing it. It's
	only at these times of highest system demand where having a good handle on your
	capacity is important.

	At any given stage, you need to be clear-headed about what it is you need to
	understand about your application. For example, if you are concerned with how
	the user experience suffers under conditions of low bandwidth or less than
	modern hardware, you're dealing with a performance concern. In such cases,
	tools that analyze individual transactions are sufficient. In Ruby, this
	includes things like New Relic and the bullet gem to study database queries
	that can be optimized. For web applications, Google PageSpeed Insights and
	Yahoo's YSlow are invaluable.

	If instead you're trying to understand whether the system can respond to
	expected demand and scale, then load testing is what's called for. There are
	tools that purport to help with this. Apache's JMeter is an example of this.
	But after my time at AOL, I consider myself a skeptic of this brand of tools.
	As mentioned above, there are too many differences between real and synthetic
	load to completely trust the results of a test.

	To complete the story of my AOL days, we learned how to do load testing right.
	Once we realized that simple load tests were getting us nowhere close to the
	confidence we needed to provision adequate hardware, our first reaction was to
	capture actual production traffic (e.g. Apache server logs) and then build
	tools to replay that against test hardware. This was better, but it was both
	difficult/expensive to do and still lacked in key real world characteristics
	that are just plain difficult to model.

	We eventually hit upon the realization that we could use bullet point three to
	our advantage. Who needs synthetic load testing when you can just use the real
	thing and get better information? To do this, you have to have a system that is
	instrumented well enough for you to know when it's stressing. Queue sizes are
	measured. Failure/abandon rates can be read and interpreted. Hardware
	performance on the server (e.g. CPU/RAM levels) are exposed. Once you have
	that in place, you start at a part of the day when actual real load is low and
	you remove some of your production capacity (e.g. load balancer adjustment).
	Then you watch your numbers as traffic grows over the course of the day. Once
	your instrumentation tells you that user performance is starting to degrade to
	unacceptable levels, you flip your load balancer configuration back so you are
	again at full capacity. After this exercise, you have an understanding of
	system performance under load that you can be much more confident in than
	anything synthetic load testing - even good load derived from actual production
	usage - can tell you.

	No approach to performance or load testing is foolproof. You have to resign
	yourself to getting battle scars that inform future system design and
	implementation. But it all starts with an understanding of the differences
	between performance and load testing.