Created
June 24, 2016 16:08
-
-
Save lbjay/36d59d5cd9376088de804b473259f13e to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hi all, | |
At Harvard DCE we've been developing a few strategies for automated horizontal scaling of workers. Some involve analysis of AWS CloudWatch metrics and others look at data from Matterhorn itself, e.g. queued jobs. All of them, to a greater or lesser degree, are hindered by the behavior of the job dispatching mechanisms. I'll give a quick example of a typical scenario and why it causes problems for both types of strategies. | |
Say we have a cluster that has 10 workers at it's disposal but only 1 currently online. A burst of producer activity creates a bunch of new workflows and quickly there are a dozen media inspection and compose jobs generated. What we would expect is that the `max_jobs: n` setting on the worker host would prevent more than n of those jobs from being assigned to the single worker, the rest being put into some sort of queued state. What actually happens is that the 1 worker happily accepts all of them because the dispatching logic in `matterhorn-serviceregistry/.../ServiceRegistryJpaImpl.java` doesn't take `max_jobs` into account for these types of jobs. | |
So if a burst of activity doesn't result in any jobs being queued it makes automated scaling up based on that value from the matterhorn statistics a no-go. To a lesser extent it creates a drag factor for load-based scaling as well; we can spin up 2 more workers in response to the original worker's load, but they will initially be underutilized because the 1st worker is still slowly chugging away on the excessive number of jobs it already accepted. | |
Questions: | |
Is there something we're missing here? | |
Has anyone explored applying the `max_jobs` setting to a wider range of job types? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment