"Those that know, do. Those that understand, teach."
Aristoteles (supposedly)
Scope: Building and running web-scale distributed cloud systems with container technologies :P
- Establish the challenges for engineering cloud-native systems (the problem space).
- Establish the approaches to address those challenges (the solution space).
- Establish a taxonomy for this segment of the software development industry (the domain).
- Define evaluation criteria for tools/products that address the challenges (the solution space).
- Assess the landscape of the solution space and how they fit the market.
Don't build a space pen (or use it) unless it's an essential part of the problem.
A cloud native application is a collection of interrelated, but discrete components (services, tasks, workers) that, when coupled with configuration and instantiated in a suitable runtime, together accomplish a unified functional purpose.
- Components: runnable units / executable units: virtual machines, containers, Functions-as-a-Service (FaaS).
- Workload type: the components runtime profile according to distinguishing points (replicable, daemonized, service addressable).
- Supporting services (managed cloud services): load balancers, object storage, databases, (DNS?).
- Traits: operational capabilities - and as such are operational concerns, as opposed to developer concerns. For instance manual scaler, autoscaler, ingress, volume mounter.
- Application Developers: deliver business value in form of application code via application components.
- Understand operational characteristics of the application (writes to a
/persistent
volume, needs2
vCPUs, listen on port 8088/tcp) but remain unconcerned with how operational requirements are fulfilled. - Focus on the business domain.
- Understand operational characteristics of the application (writes to a
- Application Operators: deliver business value by configuring, installing and managing componenets via application configurations.
- Focus on strategies for operating the application, rather than infrastructure details.
- Infrastructure operators: deliver value by managing low-level infrastructural components and supporting services.
- Focus on how the overall infrastructure is managed.
The OAM encourages:
- Application management following team structure: app developers (DEV), app operators (SRE), infra operators (INFRA).
- An opinionated workflow: app developers throw components over a wall, app operators throw application configurations over a wall, and infrastructure operators satisfy those needs in the cloud infra.
Observability: https://www.honeycomb.io/blog/observability-101-terminology-and-concepts/
- Developers should not be burdened with infrastructural concerns.
- Operators and runtimes should be free to meet a component's infrastructural needs as they see fit.
- A platform should be free to choose a runtime that is capable of running a specific workload type.
- Bundling components in higher-level systems (abstraction and reuse) as well as reusable blueprints (standardization).
- Managing components and supporting services uniformly.
- Operators should manage discrete resources (components) as a single logical unit (artifact) that comprises an app.
-
Application runtimes (PaaS): CF Application Runtime, Pivotal Application Service, Flynn, Rio (Rancher), Heroku (Salesforce), Platform.sh, Tsuru, Juju (Canonical), Banzai Cloud Pipeline
-
Container Runtime: Kubernetes, Mesos, Nomad, Docker Swarm, Amazon ECS, Azure Service Fabric, CF Container Runtime.
- Kubernetes Distribution: CF Container Runtime, Pivotal Container Service, Charmed Kubernetes (Canonical), MicroK8s (Canonical), Rancher Kubernetes, K3s (Rancher), Openshift, Triton (Joyent), PKE (Banzai Cloud)
-
Helm: manages the lifecycle of "Kubernetes applications" via charts - artifacts that bundle templates for Kubernetes manifests.
-
Cloud Native Application Bundle (CNAB): "a standard packaging format for multi-component distributed applications". Packages an application components AND an installer (invocation image) that is able to manage its lifecycle via well-known verbs ("install", "upgrade", "uninstall").
-
Cloud Native Buildpacks: "a higher-level abstraction for building apps compared to Dockerfiles"
-
Kubernetes Operators: "software extensions to Kubernetes that make use of custom resources to manage applications and their components". Operators automate Day-1 and Day-n activities by putting operational knowledge into software and abstract applications into declarative resources in order to create, configure, and manage instances of complex stateful applications.
-
Kubernetes Service Catalog: "an extension API that enables applications running in Kubernetes clusters to easily use external managed software offerings, such as a datastore service offered by a cloud provider."
-
Service broker: an implementation of the Open Service Broker API that enables platforms to provision, get access to and manage the services offered by the broker.
-
Operator Lifecycle Manager: A Kubernetes Operator for Kubernetes Operators. Provides "a declarative way to install, manage, and upgrade Operators and their dependencies in a cluster".
from less mature to most mature
- Bespoke runtime (scheduling, elasticity, etc)
- COTS runtime (e.g., Kubernetes, Mesos, Nomad, Amazon ECS)
- Platform on top of a (container) runtime
- Application Runtimes (PaaS)
See: "Types of Operators"
(Source: https://operatorframework.io/operator-capabilities/)
A platform team exposes an interface (API, control plane) to the organization infrastructure based on its policies.
On common approach is defining (and enforcing, and creating) a set of tools, but it is not ideal: it exposes an API at the wrong layer.
Kubernetes is more than a container runtime. It gives you an interface to the organization infrastructure (via its apiserver)! Doing everything via Kubernetes (applications and managed services) achieves the goal of exposing a uniform API to the org, and the teams can chose whatever tool that talks that API (terraform, kubectl, chef, ansible, you name it).
On the other hand, using tools to define an interface is a practical choice for the platform team. For example, terraform
Kubernetes provider did not allow defining arbitrary manifests until recently and the freedom of choice on tools can also bring additional support requests to the platform team.
See also: Kubernetes standardized glossary
-
Day-1 activities: installation, configuration, etc
-
Day-2 (or Day-N) activities: re-configuration, update, backup, failover, restore, etc.
-
Kubernetes-native application: application manage by a Kubernetes Operator (as per Operator Lifecycle Manager doc).
-
Application runtime:
-
Container runtimme:
-
Bespoke software: a company creates, maintains and runs/operates themself.
-
Common off-the-shelf (COTS) software: run/operated by a company that was created (and maintainted) by a third-party.
-
Managed service: run/operated and maintained by a third-party (which can potentially be another team/department in the same company).
- Architecture optimization for cost? https://twitter.com/mohapatrahemant/status/1102401615263223809
- Patterns to route traffic to a private cluster:
- https://www.getambassador.io/docs/latest/topics/concepts/kubernetes-network-architecture/#routing-traffic-to-your-kubernetes-cluster
- Edge proxy: a L7 (HTTP) proxy that accepts incoming traffic from the external load balancer and route the traffic to in-cluster services.
- Ingress controller: an edge proxy that can process Kubernetes
Ingress
resources.
- Observability blueprints for Kubernetes (what to monitor/alert):
- Common Problems:
- Helm (2?) internals
- https://about.gitlab.com/devops-tools/
- "The Structure of Design Problem Space" https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1603_3
- "Bringing Buildpacks to Kubernetes" https://www.youtube.com/watch?v=kIJ0xBldhYY&t=7s Explorer the separation of concerns and responsibilities and how buildpacks fit into the problem space. Good to discuss the problem space.
- https://blog.overops.com/pivotal-cloud-foundry-vs-kubernetes-choosing-the-right-cloud-native-application-deployment-platform/
- https://blog.colinbreck.com/using-quality-views-to-communicate-software-quality-and-evolution/
- https://twitter.com/bgrant0607/status/1121054924979064832
- https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/resource-management.md
- https://twitter.com/bgrant0607/status/1123620689930358786
- https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/declarative-application-management.md
- https://docs.google.com/document/d/1cLPGweVEYrVqQvBLJg6sxV-TrE5Rm2MNOBA_cxZP2WU/edit
- Configuration as Data (or why I hate Helm - even more since playing with
cue
)