IaC is not what it should be

The core concept of the Infrastructure as Code (IaC) is the code that is used to manage the infrastructure. In most cases, the code is in version control. There are two approaches to IaC: declarative and imperative ("what" versus "how"). The most popular declarative approach uses definitions of the desired state of the infrastructure. The imperative approach uses scripts, playbooks to define a process, a sequence of steps that need to be applied in a certain order to achieve the desired state of the infrastructure. From the practical perspective, most non-trivial projects use both declarative and imperative parts. For example, Terraform configurations and scripts or CI/CD workflows to apply Terraform configurations in the right order. Historically IaC requires using specialized formats, languages to define the infrastructure, and specialized tools to apply changes.

Configuration, not code

The biggest issue of modern IaC is that the infrastructure definition is actually a configuration, not a code. For example, YAML, which is popular in IaC word, is a markup language, not a general purpose programming language. The infrastructure definition is highly dynamic by its nature. Its complexity grows together with the applications, with new internal and external requirements and changes. While most of IaC tools focus on "Infrastructure as Configuration", this is not enough for real life projects.

Configuration wants to be code

The most common solution for this issue is using imperative tools (Bash scripts, Python or Go programs) to introduce additional layers for more flexible configuration and to orchestrate execution of other IaC tools. This creates unnecessary specialization (or compartmentalization) of devops: it is not uncommon for developers and devops to use different languages for the project development and deployment. This feels unnatural as the whole idea of devops is seamless integration of dev and infra people in the same team. Many teams feel this frustration and are trying to bring the infra part of the project back from configuration into the code realm. However, many attempts to do so have made the issue even worse: adding a semantic layer to YAML and making it Turing complete language, or adding new conditional and rendering functions to an established language, such as HCL. Some popular tools had to accept a necessary "evil" of additional semantics to achieve the flexibility they need. For example, Helm and Ansible use full-blown (although standard de facto) templating engines to render YAML. Helmfile, a tool to orchestrate Helm charts, has gone even further: it uses a templated YAML to render templated Helm values.

How did we get here

The idea of using a specialized configuration format comes from the early days, when simple small programs had to use configuration files to achieve the required flexibility. A user or admin does not need anything special, only a text editor, to adjust a program configuration. Hundreds of configuration formats make the lives of infra people miserable, but they got help from developers with XML, JSON, YAML, TOML, and so on. However, this standardization focuses on syntax only keeping the semantic out of the scope. For example, Terraform HCL provides a unified syntax layer for defining resources in multiple systems and clouds, but you still need to know and use low level primitives that are specific for each system/cloud.

DSL as configuration

The problem is not new, and there are some successful examples of the tools that achieved the required level of configuration flexibility with an interesting technique. Examples:

Sendmail and Autoconf configurations are m4 macros.
Bazel uses Starlark, a dialect of Python, in its build files.
Gradle and Jenkins use Groovy.

While these examples are very specific, niche tools, they have something in common. They use a general-purpose language (Python, Groovy) to define a domain-specific language (DSL) that covers most of the cases. For advanced cases, a user can use the full power of the embedded general-purpose language to implement missing features and achieve the required configuration flexibility. It is worth mentioning that a configuration file in DLS is a valid program in the corresponding language executed in a certain context (default imports, functions, global variables).

In general, for DLS-based configurations it is more difficult to start with - comparing, for example, with writing a YAML file - but the final result can be more desired in terms of seamless integration to the whole process.

Another solutions that are worth mentioning:

Pulumi provides SDKs for several popular languages to manage cloud resources.
AWS Cloud Development Kit (AWS CDK) provides basic building blocks to manage AWS cloud resources.
Google Cloud SDK provides tools and SDKs to to facilitate cloud development.

While these solutions help with IaC ecosystem in general, none of them provides a complete solution for the problem above. Pulumi still operates with low level primitives and by semantic expressivenes it equals to Terraform. AWS CDK and Cloud SDK are locked to a specific cloud and cannot be used as a general purpose IaC tool, potentially only as a part.

Taking everything into consideration, what would be the right approach for IaC going forward?

Use the same same programming language as the development team. All team members should be able to understand, contribute and maintain infra code.
Integrate the exiting proven tools, such as Terrafrom. Provide a high level interface, but do not hide the execution pipeline from a developer: if there is a Terraform configuration generated on a certain step then, it should be available for analysis and troubleshooting.
Top to bottom approach: a language should provide top level declarative building blocks that can be extended with imperative code in general purpose language.
Unified multi-layered configuration as code: if there is IaC to deploy an application, then it should be parametrized and used as a building block in more complex deployment process.
IaC as a single point of truth: code defines the desired state of the infrastructure, anyone in the team can get delta between the desired state and the actual state, use code to generates the lasted documentation for the infrastructure.
Open-source tools only to enable broader adoption, attract community contributors, better for searching and attracting talent.

pbchekin/01-iac-is-not-what-it-should-be.md

Select an option

No results found