The first thing we need is an EC2 AMI to base the new platform instances on (this is not the Golden AMI, it is the parent of the Golden AMI). In the case of Windows Server 2008, Q & Mark had already created a suitable parent AMI using MDT (- I believe, further documentation as my understanding develops).
Cloud Tools contains the tools and configuration for EC2 AMI (and instance) creation.
- Create a platform configuration file in the configs folder. For W2K, I created b-2008 and y-2008. At the time of writing, b-2008 denotes the naming convention for a Windows 2008 build server, y-2008 is the try equivalent.
-
hostname
"hostname": "b-2008-ec2-%03d"The hostname field value is a python string formatter indicating the default naming convention for on-demand instances of the platform. In the example above, %03d will be replaced with a zero padded number, 3 digits long.
- Spot hostnames are configured in the
slavestable of the relevant slavealloc database and do not use this value. - Golden hostnames are provided as an argument to create_instance.py and do not use this value.
- Spot hostnames are configured in the
-
region (section)
"us-east-1": { ... }Create region sections for
us-east-1andus-west-2. At the time of writing, these regions provide us with the greatest cost efficiency for our specific requirements.-
Within each region section:
"type": "b-2008"type value denotes the platform identifier and will also be used in the
moz-typeinstance/ami tags."domain": "build.releng.use1.mozilla.com"domain is pretty self explanatory, note the inclusion of build/try and aws region in the naming convention.
"ami": "ami-1d47b276"ami value denotes the EC2 AMI id used to generate the Golden AMI. It will also be used by any create_instance.py call, that references this configuration file.
"subnet_ids": [ "subnet-2ba98340", ... ]subnet_ids value is an array of subnets suitable for the aws region, platform and instance type (build/try). For b-2008, I copied the values used by linux build. For y-2008, I copied the values used by linux try. Take care to select subnets that will give your platform instances network access to the resources they require. In the case of (b|y)-2008, the necessary firewall configurations should match those used by the copied platforms.
"security_group_ids": [ "sg-e758e982", ... ]security_group_ids value is an array of security groups suitable for the platform. For b-2008, I copied the values used by linux build. For y-2008, I copied the values used by linux try. Take into consideration the types of interactions (rdp, ssh, http, etc.) you will need with the platform instances.
"instance_type": "c3.2xlarge"instance_type value denotes the EC2 virtual machine specifications. For an explanation of the instance types, see: https://aws.amazon.com/ec2/instance-types/. Compute Optimized instances tend to be a sensible choice for build/compilation tasks.
"distro": "win2008"distro value denotes the platform distribution. It seems, this value is only used by cloud-tools to handle platform specific logic like:
if not distro.startswith('win'), etc... At the time of writing, cloud-tools distro values do not equate to slave-alloc or aws distro values and I have not seen logic that compares or requires them to equate. The value should probably be deprecated out of cloud-tools with refactoring logic that examines moz-type instead, since the strings in use are rather arbitrary.The config file contains further values that are either historical baggage, or not understood (by me) enough for documentation here.
-
-
- Create a platform userdata file (also in the configs folder). The file must be named the same as the platform config file and suffixed with
.user-data. This file is passed by cloud-tools to EC2, which in turn loads it onto spawned instances. It is the primary mechanism for loading instance specific configuration onto new EC2 instances. In the case of our WIndows instances, it is used to provide instances with their hostname (and a powershell script that applies it, amongst other things). Userdata is Base64 encoded before being sent to the instance and the encoded value must be below a finite byte length. For (b|y)-2008, I implemented a mechanism that stores most of the userdata at an http accessible location, so that the length-limited element, can simply download what it needs after instantiation. If more than one platform can share the same userdata configuration, create appropriately named symbolic links for these. For example y-2008.user-data is a symbolic link to b-2008.user-data.
The AWS manager servers host a number of scheduled (cron) jobs which produce daily Golden AMIs in EC2. You can see these AMIs by filtering for Name: spot-* in the EC2 Console under AMIs.
- Either create your own AWS management environment, or work from one of the already set up, Mozilla aws-manager instances.
- Adapt and run a script like this: https://gist.github.com/grenade/147ca1cae1c70102ddf9#file-create-ami-sh to create your new platform golden AMI. Note that this script sets up DNS entries for your golden ami instance, you may need to later delete these entries (if you screw up, or rename things later) using either invtool, or the inventory web application: https://inventory.mozilla.org/
Add cron jobs for the platforms you need new golden AMIs for. For reference, the (b|y)-2008 jobs were added to Puppet with this merge: https://hg.mozilla.org/build/puppet/rev/c6dbdccb3fa5
slavealloc.py and watchpending.cfg need to know about your new platform and when/how to spin up spot instances. I'll document this shortly, but for now (b|y)-2008 was done like this:
- https://github.com/mozilla/build-cloud-tools/commit/363f72b (introduce platforms)
- https://github.com/mozilla/build-cloud-tools/commit/6741638 (set spot limits)
The slave-alloc databases need to know about your new platform and what to name spot instances. Create a CSV file similar to this:
name,basedir,distro,bitlength,purpose,datacenter,trustlevel,speed,environment,pool
b-2008-spot-001,c:\builds\moz2_slave,win2k8,64,build,us-east-1,core,c3.2xlarge,prod,build-use1
b-2008-spot-002,c:\builds\moz2_slave,win2k8,64,build,us-west-1,core,c3.2xlarge,prod,build-usw2
y-2008-spot-001,c:\builds\moz2_slave,win2k8,64,build,us-east-1,try,c3.2xlarge,prod,try-use1
y-2008-spot-002,c:\builds\moz2_slave,win2k8,64,build,us-west-1,try,c3.2xlarge,prod,try-usw2
Note that:
- the csv file contains the column headers, they are required
- a row is required for every possible spot hostname you envisage needing
- datacenter: use long name (eg: us-east-1 | us-west-2)
- trustlevel
- build: core
- try: try
- pool
- build: build-use1 | build-usw2
- try: try-use1 | try-usw2
- speed is actually unused (by cloud-tools) and is just here to show factual output in the various slave-alloc uis
- basedir I have no idea what uses this. I suspect it's important, but need to learn why.
There are two different slavealloc databases. One is for staging, the other is production. The mysql connection string you use, will determine which environment your spot instances will run in.
Import your csv data to slavealloc by ssh'ing to a server that mysql will accept connections from. There are not a lot of these and I have seen evidence that we intentionally don't document this step publicly. Talk to someone in releng/relops about where to get slavealloc mysql credentials and what servers you can connect from, then run a command like this to do the import:
slavealloc dbimport -D mysql://connection-string-goes-here --slave-data my-new-platform.csv
Connect to the mysql database and adapt/run a query like this to visually compare your newly added instances to any pre-existing similar instances:
select * from slaves where name like '%-2008-%';Fix any obvious problems, using appropriate sql, possibly similar to:
update slaves set distroid=11 where name = 'y-2008-spot-002';Enable your spot instances with sql similar to (correct the selector, for your platform):
update slaves set enabled=1 where name like '%-2008-spot-%';slave health needs to know about your new platform. When I implemented the changes for (b|y)-2008, I merged fixes for other problems in slave_health so that commit is less useful to anyone who wants to just see the minimum required to add a platform class. Instead, take a look at this one by Callek, for Win 10 testers: https://bugzilla.mozilla.org/attachment.cgi?id=8644152&action=diff
buildbot-configs needs to know about your new platform. I did this for (b|y)-2008: https://github.com/mozilla/build-buildbot-configs/commit/c9d7e86, Callek later had to make a correction to include some number ranges I missed.