The jungle I live in

"Those that know, do. Those that understand, teach."

Aristoteles (supposedly)

Scope: Building and running web-scale distributed cloud systems with container technologies :P

Learning goals

Establish the challenges for engineering cloud-native systems (the problem space).
Establish the approaches to address those challenges (the solution space).
Establish a taxonomy for this segment of the software development industry (the domain).
Define evaluation criteria for tools/products that address the challenges (the solution space).
Assess the landscape of the solution space and how they fit the market.

Don't build a space pen (or use it) unless it's an essential part of the problem.

Taxonomy

A cloud native application is a collection of interrelated, but discrete components (services, tasks, workers) that, when coupled with configuration and instantiated in a suitable runtime, together accomplish a unified functional purpose.

Components: runnable units / executable units: virtual machines, containers, Functions-as-a-Service (FaaS).
Workload type: the components runtime profile according to distinguishing points (replicable, daemonized, service addressable).
Supporting services (managed cloud services): load balancers, object storage, databases, (DNS?).
Traits: operational capabilities - and as such are operational concerns, as opposed to developer concerns. For instance manual scaler, autoscaler, ingress, volume mounter.

Roles and responsibilities:

Application Developers: deliver business value in form of application code via application components.
- Understand operational characteristics of the application (writes to a /persistent volume, needs 2 vCPUs, listen on port 8088/tcp) but remain unconcerned with how operational requirements are fulfilled.
- Focus on the business domain.
Application Operators: deliver business value by configuring, installing and managing componenets via application configurations.
- Focus on strategies for operating the application, rather than infrastructure details.
Infrastructure operators: deliver value by managing low-level infrastructural components and supporting services.
- Focus on how the overall infrastructure is managed.

The OAM encourages:

Application management following team structure: app developers (DEV), app operators (SRE), infra operators (INFRA).
An opinionated workflow: app developers throw components over a wall, app operators throw application configurations over a wall, and infrastructure operators satisfy those needs in the cloud infra.

Observability: https://www.honeycomb.io/blog/observability-101-terminology-and-concepts/

Problem Space

Developers should not be burdened with infrastructural concerns.
Operators and runtimes should be free to meet a component's infrastructural needs as they see fit.
A platform should be free to choose a runtime that is capable of running a specific workload type.
Bundling components in higher-level systems (abstraction and reuse) as well as reusable blueprints (standardization).
Managing components and supporting services uniformly.
Operators should manage discrete resources (components) as a single logical unit (artifact) that comprises an app.

Solution Space

Application runtimes (PaaS): CF Application Runtime, Pivotal Application Service, Flynn, Rio (Rancher), Heroku (Salesforce), Platform.sh, Tsuru, Juju (Canonical), Banzai Cloud Pipeline
Container Runtime: Kubernetes, Mesos, Nomad, Docker Swarm, Amazon ECS, Azure Service Fabric, CF Container Runtime.
- Kubernetes Distribution: CF Container Runtime, Pivotal Container Service, Charmed Kubernetes (Canonical), MicroK8s (Canonical), Rancher Kubernetes, K3s (Rancher), Openshift, Triton (Joyent), PKE (Banzai Cloud)
Helm: manages the lifecycle of "Kubernetes applications" via charts - artifacts that bundle templates for Kubernetes manifests.
Cloud Native Application Bundle (CNAB): "a standard packaging format for multi-component distributed applications". Packages an application components AND an installer (invocation image) that is able to manage its lifecycle via well-known verbs ("install", "upgrade", "uninstall").
Cloud Native Buildpacks: "a higher-level abstraction for building apps compared to Dockerfiles"
Kubernetes Operators: "software extensions to Kubernetes that make use of custom resources to manage applications and their components". Operators automate Day-1 and Day-n activities by putting operational knowledge into software and abstract applications into declarative resources in order to create, configure, and manage instances of complex stateful applications.
- AWS Service Operator
- Kubernetes Operator for Java
Kubernetes Service Catalog: "an extension API that enables applications running in Kubernetes clusters to easily use external managed software offerings, such as a datastore service offered by a cloud provider."
Service broker: an implementation of the Open Service Broker API that enables platforms to provision, get access to and manage the services offered by the broker.
Operator Lifecycle Manager: A Kubernetes Operator for Kubernetes Operators. Provides "a declarative way to install, manage, and upgrade Operators and their dependencies in a cluster".

Deployment platform spectrum

from less mature to most mature

Bespoke runtime (scheduling, elasticity, etc)
COTS runtime (e.g., Kubernetes, Mesos, Nomad, Amazon ECS)
Platform on top of a (container) runtime
Application Runtimes (PaaS)

Management Automation spectrum

See: "Types of Operators"

(Source: https://operatorframework.io/operator-capabilities/)

Reflections

A platform team exposes an interface (API, control plane) to the organization infrastructure based on its policies.

On common approach is defining (and enforcing, and creating) a set of tools, but it is not ideal: it exposes an API at the wrong layer.

Kubernetes is more than a container runtime. It gives you an interface to the organization infrastructure (via its apiserver)! Doing everything via Kubernetes (applications and managed services) achieves the goal of exposing a uniform API to the org, and the teams can chose whatever tool that talks that API (terraform, kubectl, chef, ansible, you name it).

On the other hand, using tools to define an interface is a practical choice for the platform team. For example, terraform Kubernetes provider did not allow defining arbitrary manifests until recently and the freedom of choice on tools can also bring additional support requests to the platform team.

Glossary

Day-1 activities: installation, configuration, etc
Day-2 (or Day-N) activities: re-configuration, update, backup, failover, restore, etc.
Kubernetes-native application: application manage by a Kubernetes Operator (as per Operator Lifecycle Manager doc).
Application runtime:
Container runtimme:
Bespoke software: a company creates, maintains and runs/operates themself.
Common off-the-shelf (COTS) software: run/operated by a company that was created (and maintainted) by a third-party.
Managed service: run/operated and maintained by a third-party (which can potentially be another team/department in the same company).

juniorz/platform-engineering.md

Learning goals

Taxonomy

Problem Space

Solution Space

Deployment platform spectrum

Management Automation spectrum

Reflections

Glossary

Futurology

Specifics

What to read

Configuration management

Infrastructure management

Sample (demo) applications