Colorado Springs Docker Meetup 2/22/17
Orchestrating Least Privilege Docker Distributed Systems Summit
Majority of container orchestrators use the RAFT algorithm raft.github.io
###############################
Mesos
- Why:
- Abstraction layer that allows you to treat many computers as a single target
- Provides fault tolerance infrastructure
- Supports more than docker containers
What do you need for Mesos:
- Mesos
- Zookeeper - provides the quorum (refer to this as service discovery)
- Marathon - scheduler
- Docker
- You can point a Hadoop scheduler to the Mesos Master
Largest customer: Twitter runs on it - 100K nodes
Kubernetes:
What you need:
- etcd
- Kubernetes
- Docker
- cAdvisor - for monitoring *Hardest part about it is the networking
###############################
Swarm:
- Has automatic TLS with docker
- Seems like since Docker 1.12.1, Docker has caught up to k8s.
- No longer requires installling a separate key/value store
- Multi-container deployment called “stacks”
- Individual containers in a stack called a “service” (like compose)
- Stacks can be created via ‘docker stack’ command using
- Distributed Application Bundle (DAB) file
- (since Docker Engine 1.13) Compose file
- Individual services can be created via ‘docker service’ command
- Node failures now will reschedule containers to healthy nodes!
- Stack via DAB
- works with 1.12.1 with swarm mode enabled
- Most portable deployment configuration format
- exposes random container ports as needed
- doesn’t allow for creation of volumes
- Stack via compose file
- works in docker engine 1.13+
- use docker stack deploy —compose-file
- will maintain configured exposed ports
- will maintain configured volumes
- works in docker engine 1.13+
- Compose File format Version 3
- Works in Docker engine 1.13+
- Adds deployment configurations to services
- mode
- global (one container per swarm mode)
- replicated (specified number of containers)
- replicas (number of containers to deploy in replicated mode)
- placement (node/label constraints)
- update_config (how containers get updated)
- resources (limits/reservations on cpu/mem/etc/resources
- restart_policy (condition, delay, max_attempts, window)
- mode
#################################
Honorable mentions:
- Rancher
- Replicated (Triton) *SaaS application and run it on-prem is the value
- AWS (ECS, Elastic Beanstock)
- Pivotal Cloud Foundry
###################################
Monitoring: Users - yoozers - A distributed fault injection system the containers are way to volatile to continually monitor with tools like nagios
- Prometheus was chosen
- Pushgateway for VERY short lived workloads even as a little as a shell script
- Scale, grabs 100,000’s of metrics per second put them into a csv.
- Alertmanagers recommended to run two of them and send alerts to both of them at the same time
- Contributing to Grafana as a data source endpoint
- Technically came from Soundcloud (3 engineers from Google and went to soundcloud)
- OpenSource re-invention of Borgmon
- Pull scrape plain text page or key/value store
- read values into memory
- Evaluate rulesets vextor arithmetic
- typically every 10 or 60 seconds starting all at once (some recommend every 5 seconds
- send alerts
- record to TSDB (InfluxDB, Graphite) however it doesn’t need one ( most popular one is graphite)
- /metric html page and it will show you plain texts counters
- Bad alerts
- tell me when the file system is @ 90% full
- Better alerts (what Prometheus does)
- How long before the disk is full
- Less than x hours of space based on current rate of change
- How long before the disk is full
- cAdvisor (container Prometheus endpoint)
- Container running
- Container Metrics * CPU * Memory * Network in/out
****Compose has caught up a LOT since the last update. ****Your local syntax will be the same locally as it will in production or development environment if you are running docker swarm ****9/16 - 1.12 compared to k8s and swarm killed k8s in every category.