Terraform Modules

When developing new modules, or editing existing ones, the following standards should be met to ensure a consistent workflow is used.

Stop! Do you need to write a module?

But before we delve into the rules surrounding modules, considering the following: do you need to write a module to begin with? You should consider copying code before you consider making a module - more often than not it that code you’re writing doesn’t need to be a module

This is a big one because modules are great and it’s easy to get carried away with them, but they’re also not the ideal choice in most cases. There are some (loose) rules regarding modules that should be answered before you begin development…

Are you managing at least five resources?
Do you have a need to use the module more than three times right now?
Are you just trying to organise code?
Are you willing to place this module/code into its own repository AND implement and manage all the security that comes with that task?

In most cases code that you’re repeating once or twice can just be copy and pasted. The costs associated with creating module aren’t always obvious:

Every resource inside your module can no longer be directly configured so you have to expose some or all properties via variables (inputs.tf) which is probably repetition than just copying the code a few times
- (You also have to provide/output{} all the attributes too)
The variables and functionality you expose with a module constitute an API and APIs are fragile - if the underlying systems they abstract change your API breaks and so does any code using your module (which is why version pinning is important)
There is a learning curve associated with a module: what inputs does it take? What outputs will it give me? Can I safely use this module multiple times in one state?

Before writing your code as a module stop and consider if you can’t simply copy/paste some code instead.

tl;dr

The following rules apply and are broken down into more detail below:

All module inputs should be stored in inputs.tf
All module outputs should be stored in outputs.tf
Any locals{} that are required should go in locals.tf
All variable{}s should use input validation and type constraints
All variable{}s should have a description property defined
Any variable{}s that are defined should go in variables.tf
Filenames should be named after the service they manage (ec2.tf, rds.tf)
Filenames may include a specific service name such as ami if the name ec2.tf is too broad, for example ec2_ami.tf for code that manages AMIs
Resource names should never contain the type of resource they’re managing - it’s in the resource type already
Using dashes (-) in strings is preferred whenever it’s possible
Modules should always be stored in their own git repository
Git repositories should make use of semantic versioning and use git tags to mark release candidates
All modules require a README.MD file explaining how-to use the module effectively

Below we’ll break down the rules a bit more and provide examples (when applicable or appropriate.)

All module inputs should be stored in `inputs.tf`

Providing an inputs.tf file makes it clear and obvious where an engineer has to look to review or manage a module’s variable{} code.

All module outputs should be stored in `outputs.tf`

Providing an outputs.tf file makes it clear and obvious where an engineer has to look to review or manage a module’s variable{} code.

Any `locals{}` that are required should go in `locals.tf`

Any locals{} that are defined should be stored in a central location called locals.tf so that it’s easy to locate, review and manage such information.

All `variable{}`s should use input validation and type constraints

Terraform affords us the ability to validate input as of 0.13 (it’s considered beta in 0.12.29). This grants us some obvious benefits but also the not-so-obvious benefit of saving time: instead of waiting for the AWS or Azure APIs to tell us a field’s value (such as an AMI ID, or a Key Vault secret name) is incorrect we’ll learn of this fact at compile time, locally, without any network activity at all.

Using validation is easy:

variable "image_id" {
  type        = string
  description = "The id of the machine image (AMI) to use for the server."

  validation {
    # regex(...) fails if it cannot find a match
    condition     = can(regex("^ami-", var.image_id))
    error_message = "The image_id value must be a valid AMI id, starting with \"ami-\"."
  }
}

We can also provide our variable with a type constraint. This further allows Terraform to catch problems at compile time and also acts as a form of documentation. An example of this can be seen above with the type property.

All `variable{}`s should have a description property defined

This should be self explanatory - it documents the purpose of the variable.

Any `variable{}`s that are defined should go in `variables.tf`

This is about discovery again and is similiar in concept to inputs.tf and outputs.tf - it makes it easy to find where variables are being defined as opposed to hunting around loads of files or using an IDEs go to reference capabilities (which can take people out of their flow.)

Filenames should be named after the service they manage (`ec2.tf`, `rds.tf`)

Being able to locate the configuration for an Auto Scaling Group by going to ec2_asg.tf is obviously easier than trying to find it among many ambiguous files. This is also about discoverability and making it easy to find the code you’re looking for.

Filenames may include a specific service name such as ami if the name `ec2.tf` is too broad, for example `ec2_ami.tf` for code that manages AMIs

Files can get big quickly if you’re using a lot of EC2 services and you’ve got them all in ec2.tf. Instead it’s acceptable to break them out into individual files such as ec2_ami.tf, vpc_subnets.tf, vpc_sg.tf and so on. This is a better option than using sub-directories (which are Terraform modules by definition) to organise code, which just results in modules within modules and causes a lot of extra work and problems further down the line.

Resource names should never contain the type of resource they’re managing - it’s in the resource type already

Writing resource “aws_instance” “ec2-web-server” is pointless because the same information is present in the resource type - instance - so we know it’s an EC2 resource. The same with virtually every other resource type.

Remember that all the information you need about resources is available in the console using various Terraform commands.

Using dashes (`-`) in strings is preferred whenever it’s possible

This makes it easier to move around the code using keyboard shortcuts. For example using Alt+<Arrow Keys> on the following strings yields different results:

my-web-server will allow the cursor to move from the very beginning of the string to the beginning of the next word, web
but the string my_web_server pushes the cursor to the end of the string, making it more difficult and time consuming to edit the word web inside of the string

This is a small detail but when you’re editing this code every day it quickly becomes very convenient to move between words regardless of your IDE choice.

Modules should always be stored in their own git repository

If you’re going to write a module and just keep it alongside the code that import the module then you’re best not even using a module to begin with. The sole purpose of modules existing as a concept is they can be shared with others - sticking them in a sub-directory isn’t sharing it organising code.

When we agree that some code is repeated often enough and is going to be encapsulated in a module we must keep the module in its own repository. This comes with many advantages:

You can version control the individual module and have code bases pull the module based on a specific version;
- This is good practice and prevents breaking changes to legacy environments;
- Others can also pin their use of your module to specific version;
You can control access to the module, locking it down to specific users;
- So a module aimed at managing IAM policies should be locked down to security personnel versusing allowing anyone to edit their own access to the system;
It becomes a lot easier to share with others given its isolated nature;
You can use a private CI pipeline for the module to do tests and linting on changes;
You get a separate commit log for easy auditing of who did what, when, etc;

But the most important advantage is the sharable nature of a module in a git repository.

All modules require a `README.MD` file explaining how-to use the module effectively

Because documentation is important. At minimum the README should contain:

What the module manages for the user
What inputs it takes and which have default values (and what that value is)
What attributes it exposes
Who wrote it and how they can be contacted

And anything else that you believe can assist others to better understand the nature and function of the module.

mrcrilly/terraform_modules.md

Terraform Modules

Stop! Do you need to write a module?

tl;dr

All module inputs should be stored in inputs.tf

All module outputs should be stored in outputs.tf

Any locals{} that are required should go in locals.tf

All variable{}s should use input validation and type constraints

All variable{}s should have a description property defined

Any variable{}s that are defined should go in variables.tf

Filenames should be named after the service they manage (ec2.tf, rds.tf)

Filenames may include a specific service name such as ami if the name ec2.tf is too broad, for example ec2_ami.tf for code that manages AMIs