If you're anything like the rest of us, you're suddenly being asked to make a lot of unplanned infrastructure changes and you haven't been given a lot of time to prepare. You're probably looking into some kind of automation so that you can keep up.
I've got a few tips on getting started and doing it in a way that's less prone to missteps along the way, especially if this is new ground for your team. And at the end is a quick tutorial on using Bolt for quick and easy automation wins.
- Make a plan, but allow for course corrections.
- Start small and iterate.
- Automate the low hanging fruit only.
- Document, document, document!
It's tempting to jump in with both feet and just start working. There's a lot to do and no time to waste. Unfortunately, this eagerness often leads to mistakes, like automating the wrong thing or configuring things incorrectly. Sometimes it even leads to disasters when poorly planned infrastructure isn't able to handle the load thrown at it. But at the same time, planning out the minutia consumes precious time that you'll never get back.
I suggest a middle of the road approach. Sketch out a plan of action. Choose the technologies you're going to use. Assess potential pitfalls and roadblocks. Assign work to team members, with clearly delineated interoperation agreements. For example, this is not the time for one team to build single-sign-on with GitHub oAuth and for another team to integrate with your existing LDAP service.
Plan for changes though. You're likely working with incomplete information and the environment we're working in is changing continuously--even daily. For example, your company may suddenly pivot to providing virtual services or home delivery whether or not you've got the infrastructure to support it. So anticipate these direction changes and adapt. Plan daily virtual standups to keep the team in sync and clearly communicate when your changes will affect other team members.
As you're building your plan, remember that you cannot do it all at once. There's an overused phrase in the industry that says "don't try to boil the ocean." This means that if you invest a large amount of time and effort into a single all-inclusive rollout, then that rollout has a much higher chance of failure than if you'd taken a slower iterative approach.
Instead, I suggest that one of your first milestones be ensuring that all machines in your infrastructure, from servers to individual laptops, have a configuration management agent like Puppet installed whether or not they're actually managing anything. These systems report back information about each machine, so this gives you insights into the true state of your infrastructure allowing you to center your plan on data rather than assumptions. Then when you're ready for it, this enables you to incrementally roll out configuration policies as they're written.
This might sound odd coming from an automation company. But when time is of the essence, it's the large number of simple tasks that will give you the best return on time investment. Large or tricky automation jobs do have a big impact, but they also take a lot of time to build, test, and troubleshoot. When you've got a tool like Bolt that lets you easily scale your existing processes to a whole fleet at once, the cumulative time savings for starting with the little things add up quickly. See more about Bolt below.
Wait until you've got the time to do it right to tackle the complex jobs, and for now use the time you save with the low hanging fruit to dedicate more time to making sure that your manual processes are successful.
Like we said above, even the best laid plans are subject to change. This means that you're going to be making a lot of decisions that are effectively going to be codified into practice, at least for the time being. Make sure you document everything, especially the rationale behind the decision. As you take your notes, make sure that they'll be understandable when you go back to turn them into documentation. When future you or other team members need to troubleshoot or refactor, this will be invaluable.
Often, sysadmins will have a list of shell commands that they run, or a collection of small shell scripts used to provision and configure machines. The challenge when using those at scale is consistency. How do you ensure that you and all your team members run all the commands and in the same order each time? Copy & pasting from a wiki mostly works when you're building a single machine and have time to go back and check your work but that certainly doesn't scale to hundreds of machines.
But those scripts still have value. Bolt can help you reuse that existing knowledge and scale it up to your new challenge. Instead of learning a new language and rewriting everything all at once, Bolt lets you use your existing scripts with little or no modification. In other words, it can help you quickly ramp up these easy parts and leave you more time for the hard problems. I'll show you the basics here.
Bolt runs on your own workstation, so you'll first want to install the package for your operating system.
Then let's try out a remote command. We'll use the "remote" host of localhost
for validation first. Notice the nested quotes. That's because we're passing the entire string to the remote machine as a command to run.
$ bolt command run "echo 'hello world'" --targets localhost
Started on localhost...
Finished on localhost:
STDOUT:
hello world
Successful on 1 target: localhost
Ran on 1 target in 0.01 sec
That worked pretty well. Now let's try it on a real remote host. Choose the address of a machine that you can SSH into and run the command again using that address for the target. This time you'll need to pass login credentials, or you can see the docs for other authentication methods, such as configuring automatic key-based SSH login in your ~/.ssh/config
.
$ bolt command run "echo 'hello world'" --targets 172.16.196.129 --user root --password hunter2
Started on 172.16.196.129...
Finished on 172.16.196.129:
STDOUT:
hello world
Successful on 1 target: 172.16.196.129
Ran on 1 target in 0.51 sec
You can pass as many targets as you like in a comma separated list. But eventually that's going to end up being tedious, so let's configure an inventory file to simplify this.
Create a yaml file at ~/.puppetlabs/bolt/inventory.yaml
that looks like this:
---
version: 2
groups:
- name: linux
targets:
- <hostname-1>
- <hostname-2>
- name: windows
targets:
- winrm://<hostname-1>:55985
- winrm://<hostname-2>:55985
config:
ssh:
user: <username>
# use either password or private-key
password: <password>
#private-key: ~/.ssh/id_rsa
winrm:
user: <username>
password: <password>
ssl: false
Now you can run that same command on all your nodes at once by passing the group name as a target!
$ bolt command run "echo 'hello world'" --targets linux
Started on 172.16.196.129...
Finished on 172.16.196.129:
STDOUT:
hello world
Successful on 1 target: 172.16.196.129
Ran on 1 target in 0.72 sec
Excellent work! Now it's time to try that with a shell script. For this example, I'm using the bashcheck script to do a quick shell vulnerability assessment. Save that script to your local directory, or use any other shell script you'd like.
$ bolt script run bashcheck.sh --targets linux
Started on 172.16.196.129...
Finished on 172.16.196.129:
STDOUT:
Testing /usr/bin/bash ...
Bash version 4.2.46(2)-release
Variable function parser pre/suffixed [(), redhat], bugs not exploitable
Not vulnerable to CVE-2014-6271 (original shellshock)
Not vulnerable to CVE-2014-7169 (taviso bug)
Not vulnerable to CVE-2014-7186 (redir_stack bug)
Test for CVE-2014-7187 not reliable without address sanitizer
Not vulnerable to CVE-2014-6277 (lcamtuf bug #1)
Not vulnerable to CVE-2014-6278 (lcamtuf bug #2)
Successful on 1 target: 172.16.196.129
Ran on 1 target in 0.92 sec
As you can see, not only is my test machine already patched from these vulnerabilities, but Bolt does all the work of transferring the script to the host nodes and running it for you. All you need to do is ensure that the host machines have an interpreter capable of running the script. Pro-tip; if you set the shebang line properly, you can use any scripting language you'd like. If you're targeting Windows hosts, you can run PowerShell scripts by using a .ps1
file extension.
Now you know just enough to be dangerous, and it only took 5-10 minutes of reading and experimenting. You've now got the ability to easily run commands and scripts across your entire infrastructure. We know that you're hard pressed for time now, so start with this. Write shell scripts to configure as needed and then let Bolt automate that across your whole fleet. You'll get a report back of successes and failures so you'll know exactly which machines need more attention. And then when you've got more time and are ready to learn more, come back and try the hands-on lab to learn some of the more advanced features!
- Work through the Bolt hands-on lab
- Read the Bolt docs
- Watch Mike Stahnke's old-but-still-good presentation on getting started with Puppet