kcmannem · August 29, 2019 02:59 · ddadlani · Aug 30, 2019 · kcmannem · Aug 30, 2019
diff --git a/gistfile1.txt b/gistfile1.txt
 # Cost of Concourse

 If we're able to disect and expose the costs for running Concourse. We can better answer customer questions as to why they're spending so much for this tool. Costs that are caused by running a customers workload may get grouped as a cost for running Concourse itself. By having a framework which decomposes the fixed, variable, and marginal costs, we can better nagivate and control this conversation.

 # Types of Cost

 By design Concourse has a set of cluster manangment components that drives costs up when compared to similar sized worker pool on other build systems. We can classify this as the __Fixed Cost__ for running Concourse. Regardless of the deployment size, at minimum this cost has to be payed. The supported deployment method of Concourse for our customers is through BOSH, so we mustn't forget it's costs as well.

 This is what I see as the minium well running deployement scheme:

 	Fixed_Cost = BOSH Director + 1 LB + 2 Web + 1 DB

 	* 2 Web nodes are required for HA
 	* BOSH creates an LB which is also required for multi-web environment
 	* Director is needed for day1/2 operations
 	* Concourse has never used more than 1 DB

 It's important to note that the size of the VM's created for these components also affects the cost of operation. Looking at our internal deployments, I'd argue that we can disregard this variable as we tend to scale (`n1-standard` - 2 cpus, 8 gbs) horizontally vs vertically.
 You could argue that some deployments have more than 2 web nodes but I'd file this under a marginal cost as it has more to do with how much it would cost to run an additional build on top of the baseline the fixed cost provides.

 The __Variable Cost__ is composed by the size of worker pool being used. Bigger/More workers will provide customers with larger throughput of builds. Concourse affects this throughput slightly (idk maybe more) because it also neighbors administrative workloads alongside customer workloads and places load in a non optimal ways. These come in the form of check containers (some other stuff I might be forgetting) and placement strategies. Our best case scenario is that workloads ran on Concourse have the same throughput and cost as if the allocated worker pool had been running these scripts without Concourse middleware.

 	Variable_Cost = N(Workers)

 The balance we want to find here when comparing against other tooling could be:

 	build througput ~ avg build time ~ variable_cost

 We can make an argument about sacrificing 1/3 but we're probably worse at all of them currently.

 One of the trickiest, and unanswered question is "How much does each additional build cost". It's answer lies in how many builds can Concourse even run. Our fixed cost per build is just `fixed_cost/n_builds` but this cost curve is not flat. Each web node manages a limited number of builds well, before additional nodes need to be added in. Therefore the __Marginal Cost__ is dictated by how well Concourse scales (`builds`/`web`). 
 The nature of how build events are triggered through out the pipeline are dictated by heavy loaded DB queries. At some point the DB will have to be scaled vertically. We have no insights on when exactly to do this. We do not have any benchmarks as to how many builds a single/additional Web node can manage either (Clara's work with the algorithm gave us some numbers which we can use in the future).

 # Thoughts

 By going through this excersize we shouldn't get discourged by the inefficencies and we shouldn't be ignorant of thee costs. Customers should know the value they're paying for. I came across this:

 *" It is generally better to optimize your price for your own value provided to customers, especially if you are offering more value than the competition."*
	# Cost of Concourse

	If we're able to disect and expose the costs for running Concourse. We can better answer customer questions as to why they're spending so much for this tool. Costs that are caused by running a customers workload may get grouped as a cost for running Concourse itself. By having a framework which decomposes the fixed, variable, and marginal costs, we can better nagivate and control this conversation.

	# Types of Cost

	By design Concourse has a set of cluster manangment components that drives costs up when compared to similar sized worker pool on other build systems. We can classify this as the __Fixed Cost__ for running Concourse. Regardless of the deployment size, at minimum this cost has to be payed. The supported deployment method of Concourse for our customers is through BOSH, so we mustn't forget it's costs as well.

	This is what I see as the minium well running deployement scheme:

	Fixed_Cost = BOSH Director + 1 LB + 2 Web + 1 DB

	* 2 Web nodes are required for HA
	* BOSH creates an LB which is also required for multi-web environment
	* Director is needed for day1/2 operations
	* Concourse has never used more than 1 DB

	It's important to note that the size of the VM's created for these components also affects the cost of operation. Looking at our internal deployments, I'd argue that we can disregard this variable as we tend to scale (`n1-standard` - 2 cpus, 8 gbs) horizontally vs vertically.
	You could argue that some deployments have more than 2 web nodes but I'd file this under a marginal cost as it has more to do with how much it would cost to run an additional build on top of the baseline the fixed cost provides.

	The __Variable Cost__ is composed by the size of worker pool being used. Bigger/More workers will provide customers with larger throughput of builds. Concourse affects this throughput slightly (idk maybe more) because it also neighbors administrative workloads alongside customer workloads and places load in a non optimal ways. These come in the form of check containers (some other stuff I might be forgetting) and placement strategies. Our best case scenario is that workloads ran on Concourse have the same throughput and cost as if the allocated worker pool had been running these scripts without Concourse middleware.

	Variable_Cost = N(Workers)

	The balance we want to find here when comparing against other tooling could be:

	build througput ~ avg build time ~ variable_cost

	We can make an argument about sacrificing 1/3 but we're probably worse at all of them currently.

	One of the trickiest, and unanswered question is "How much does each additional build cost". It's answer lies in how many builds can Concourse even run. Our fixed cost per build is just `fixed_cost/n_builds` but this cost curve is not flat. Each web node manages a limited number of builds well, before additional nodes need to be added in. Therefore the __Marginal Cost__ is dictated by how well Concourse scales (`builds`/`web`).
	The nature of how build events are triggered through out the pipeline are dictated by heavy loaded DB queries. At some point the DB will have to be scaled vertically. We have no insights on when exactly to do this. We do not have any benchmarks as to how many builds a single/additional Web node can manage either (Clara's work with the algorithm gave us some numbers which we can use in the future).

	# Thoughts

	By going through this excersize we shouldn't get discourged by the inefficencies and we shouldn't be ignorant of thee costs. Customers should know the value they're paying for. I came across this:

	" It is generally better to optimize your price for your own value provided to customers, especially if you are offering more value than the competition."