Created
August 29, 2019 02:59
-
-
Save kcmannem/d5263d29ba8e3f848f46dd4d32bbfe27 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Cost of Concourse | |
If we're able to disect and expose the costs for running Concourse. We can better answer customer questions as to why they're spending so much for this tool. Costs that are caused by running a customers workload may get grouped as a cost for running Concourse itself. By having a framework which decomposes the fixed, variable, and marginal costs, we can better nagivate and control this conversation. | |
# Types of Cost | |
By design Concourse has a set of cluster manangment components that drives costs up when compared to similar sized worker pool on other build systems. We can classify this as the __Fixed Cost__ for running Concourse. Regardless of the deployment size, at minimum this cost has to be payed. The supported deployment method of Concourse for our customers is through BOSH, so we mustn't forget it's costs as well. | |
This is what I see as the minium well running deployement scheme: | |
Fixed_Cost = BOSH Director + 1 LB + 2 Web + 1 DB | |
* 2 Web nodes are required for HA | |
* BOSH creates an LB which is also required for multi-web environment | |
* Director is needed for day1/2 operations | |
* Concourse has never used more than 1 DB | |
It's important to note that the size of the VM's created for these components also affects the cost of operation. Looking at our internal deployments, I'd argue that we can disregard this variable as we tend to scale (`n1-standard` - 2 cpus, 8 gbs) horizontally vs vertically. | |
You could argue that some deployments have more than 2 web nodes but I'd file this under a marginal cost as it has more to do with how much it would cost to run an additional build on top of the baseline the fixed cost provides. | |
The __Variable Cost__ is composed by the size of worker pool being used. Bigger/More workers will provide customers with larger throughput of builds. Concourse affects this throughput slightly (idk maybe more) because it also neighbors administrative workloads alongside customer workloads and places load in a non optimal ways. These come in the form of check containers (some other stuff I might be forgetting) and placement strategies. Our best case scenario is that workloads ran on Concourse have the same throughput and cost as if the allocated worker pool had been running these scripts without Concourse middleware. | |
Variable_Cost = N(Workers) | |
The balance we want to find here when comparing against other tooling could be: | |
build througput ~ avg build time ~ variable_cost | |
We can make an argument about sacrificing 1/3 but we're probably worse at all of them currently. | |
One of the trickiest, and unanswered question is "How much does each additional build cost". It's answer lies in how many builds can Concourse even run. Our fixed cost per build is just `fixed_cost/n_builds` but this cost curve is not flat. Each web node manages a limited number of builds well, before additional nodes need to be added in. Therefore the __Marginal Cost__ is dictated by how well Concourse scales (`builds`/`web`). | |
The nature of how build events are triggered through out the pipeline are dictated by heavy loaded DB queries. At some point the DB will have to be scaled vertically. We have no insights on when exactly to do this. We do not have any benchmarks as to how many builds a single/additional Web node can manage either (Clara's work with the algorithm gave us some numbers which we can use in the future). | |
# Thoughts | |
By going through this excersize we shouldn't get discourged by the inefficencies and we shouldn't be ignorant of thee costs. Customers should know the value they're paying for. I came across this: | |
*" It is generally better to optimize your price for your own value provided to customers, especially if you are offering more value than the competition."* |
@ddadlani I'd put that under the marginal cost. I had a short chat with james who mentioned that customers don't scale unless we tell them to. And they'll use what ever we tell them in the beggining or what the PA setup. But you're right part of the next steps will be to gather this data.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Can we back this up with information from current heavy users/customers of Concourse. I like what you have outlined but without concrete data we run the risk of assuming certain things, e.g. that
web
is a fixed cost. Have we seen users scale up web due to high load before? How often does that happen? I agree that workers are much more likely to be scaled up than web, but it'd be nice to have that data stored somewhere.