Testing is a common practice for software teams and has evolved tremendously over the last 2 decades. Test Drive Development (TDD), Unit Testing, Integration, Acceptance Testing, you name it there is a testing pattern for it. However, less attention is paid infrastructure testing. It’s more of a nebulous topic and is often a bit more difficult to know where it fits into your development process. developers responsible for writing these tests or the folks deploying the software? What if it’s the same team and you’re following a more “DevOps” model?
We’ll attempt to answer some of those questions and more by showcasing a common tool that can be used for infrastructure testing, called Inspec, and patterns your team can adopt to test your infrastructure like you test your software.
Inspec is an open-source framework, written and maintained by Chef, for auditing and testing your application and infrastructure. It works by comparing the state of a system or component to the desired state you express within the Inspec DSL (Domain Specific Language).
If you’re familiar with Rspec or similar Behavior Driven Development style tooling, Inspec is similar in its style of writing tests. The intent is to be human (and machine) readable as much as possible.
A simple example of this is testing that the telnet service is not installed as shown below.
describe package('telnetd') do
it { should_not be_installed }
end
Inspec supports a wide variety of resources, from popular public cloud APIs (such as AWS, GCP, and Azure) to most linux, host-level functionality you’d be looking to validate. Some of these are included in the base install and others are included via resource packs, such as inspec-aws.
Let’s review a few use cases that are commonly used.
Validate that you’re EC2 exists, is running, and is using the AMI ID ami-12345
describe aws_ec2_instance(name: 'testWeb') do
it { should be_running }
its('image_id') { should eq 'ami-12345' }
end
Using the port resource allows you to validate that a given port is listening or not listening and also test the service that listening on that port. Validate that syslog is listening on port 514
describe port(514) do
it { should be_listening }
its('processes') {should include 'syslog'}
end
An alertnative, would be, make sure we’re not listening on port 80 and only accepting TCP connections on a secure port, 443
describe port(80) do
it { should_not be_listening }
end
describe port(443) do
it { should be_listening }
its('protocols') { should cmp 'tcp' }
end
Using the service resource, you can validate that a given service is installed, running, and/or enabled. This is helpful when you’re setting up services that you want to run on instance book, like a logging agent such as fluentd.
describe service('fluentd') do
it { should be_installed }
it { should be_enabled }
it { should be_running }
end
This may be a less common use case now that Container as a Service software has become more popular, but there may be times where you’d like to validate a set of docker containers are up and healthy on a host.
docker.containers.running?.ids.each do |id|
describe docker.object(id) do
its('State.Health.Status') { should eq 'healthy' }
end
end
Writing a single test file with Inspec is great when you’re first starting out, but ideally you’ll want to include these tests in your automated deployments. Whether that is through CI/CD system (Jenkins, Concourse, etc) or included in your Ansible scripts, it’s advised to make these part of your deployment and build process where possible.
How you include them is up to your individual team and development flow, but a few common patterns are:
- Building machine images (i.e. an AWS AMI) and performing post-build validation
- Running periodic tests on hosts to ensure compliance
- Run validation post virtual machine build (i.e. an EC2 instance)
Some combination of these approaches is likely best to cover all scenarios. For myself, I tend to focus on #1 and #3 above. Meaning, I'd have a suite of tests run during AMI builds that use a combination of Packer, Ansible, and Inspec. As the last step in the build, it validates that the Ansible scripts installed the correct software versions, enabled the correct services, etc.
Here is a snippet from a full working example of building an AMI on AWS with Packer and Ansible and performing validation with Inspec during the build. If any of the tests fail, the AMI build fails.
describe service('sshd') do
it { should be_installed }
it { should be_enabled }
it { should be_running }
end
describe service('docker') do
it { should be_installed }
it { should be_enabled }
it { should be_running }
end
describe port(22) do
it { should be_listening }
end
describe command('pgrep docker') do
its('exit_status') { should eq 0 }
end
describe file('/etc/systemd/system/docker.service') do
it { should exist }
it { should_not be_directory }
end
describe file('/docker/docker') do
its('type') { should eq :directory }
it { should be_directory }
end
describe command('docker') do
its('exit_status') { should eq 0 }
end
describe command('aws --version') do
its('exit_status') { should eq 0 }
end
describe file('/etc/ntp.conf') do
it { should be_file }
it { should_not be_directory }
end
# inspec exec host_validation.rb
Profile: tests from host_validation.rb (tests from host_validation.rb)
Version: (not specified)
Target: local://
Service sshd
✔ should be installed
✔ should be enabled
✔ should be running
Service docker
✔ should be installed
✔ should be enabled
✔ should be running
Port 22
✔ should be listening
Command: `pgrep docker`
✔ exit_status should eq 0
File /etc/systemd/system/docker.service
✔ should exist
✔ should not be directory
File /docker/docker
✔ should be directory
✔ type should eq :directory
Command: `docker`
✔ exit_status should eq 0
Command: `aws --version`
✔ exit_status should eq 0
File /etc/ntp.conf
✔ should be file
✔ should not be directory
Test Summary: 16 successful, 0 failures, 0 skipped
This test file checks a few things for us:
- SSH is installed and listening on port
22 - Docker is running and expected file/directories are present
- AWS CLI is installed
See a full working example, which includes sample packer build automation scripts here