i wanted a text-manifest driven solution to the problem of knowing what's on an instance and how it is configured. i especially didn't want the manifest to contain any programming language so that there are fewer barriers to understanding the instance state and so that the effects of applying the manifest are as transparent as can be.
puppet ticks many of the boxes for the requirement but the showstopper was that a puppet agent does not exist on a vanilla ec2 instance. which means that something else has to install puppet and you end up needing 2 things to solve 1 problem.
ec2 only gives you 2 options for bootstrapping a windows instance. cmd and powershell. cmd doesn't have any way of downloading something from the internet (without first having some tool like wget pre-exist on the instance), powershell does. out of the box, powershell can do lots of things but the ability to download over http is the kicker.
mostly because it's available (as in, built into powershell). second: it's sole purpose is to create and maintain an instance in its desired state. since i didn't want to reinvent that wheel, i used the tools available to me.
it diffs nicely and it provides the robustness for interacting with other systems.
it's a set of powershell scripts and json manifests that bootstrap an instance to ensure that it contains everything listed in its manifest.
desired state configuration is a powershell syntax for describing the state instances should be in. what software and which version should be installed, what firewall, registry and environment settings should exist. it's built in to powershell 4 and later so works out of the box on win2012r2/win10 and later and is backwards compatible to win7 with a bit of contortion.
it converts a json manifest into a powershell dsc configuration and then let's dsc do all the heavy lifting.
dsc uses a pattern of a test script and an implementation script (for each thing in the manifest) to check if something is already in the desired state and put it there if it isn't. so for example a test script might check wether the output of hg --version
contains 3.9.1
and the implementation script will download and run an installer for hg version 3.9.1.
dsc normally runs as a scheduled task under the system user account every 15 minutes. that's disabled on test worker instances so that system performance is not interfered with. it's left enabled on build worker instances to allow us to respond quickly to change requirements or support incidents.
when we create amis, we use occ to put an instance into desired state at bootup. then we shut it down and capture the ami. this happens automatically whenever the master branch of the occ github repo is changed thanks to a tc-github task. we run occ again on each spot instance as it boots. we then either run or disable dsc based on the worker type.