Skip to content

Instantly share code, notes, and snippets.

@timmow
Created November 19, 2014 16:29
Show Gist options
  • Select an option

  • Save timmow/3724eccedbea9178d943 to your computer and use it in GitHub Desktop.

Select an option

Save timmow/3724eccedbea9178d943 to your computer and use it in GitHub Desktop.
# linux containers from scratch
different to virtual machines
full os in container has some gotchas
containers are transparent
host os sees process from within container
namespaces - chroot is a namespace
namespace - processes
same process can have multiple pids, one from within namespace, one without
c(ontrol)groups - limit resources to a group, cpu / memory
minimal container contains shell commands, systemd-nspawn starts the container
ip netns - network namespaces
veth - virtual ethernet
ip netns exec - execute command in namespace
aufs efficient for small files, as file based rather than block based
aufs - 2 directories, combine by mounting into a "tmp" directory, changes to teh
tmp directory are copied into the 2nd dir
full os in container - not recommended
fsck used as an example, get strange errors with container
uts namespace
# web perf
set a target, improve on it
2-4s good target for page draw
conversion vs landing page graph 2010 - 2012 is interesting - things have
changed
users less tolerant of slowness
html delay - needs to be less than 200ms to not affect things
should test this yourself by slowing your site down, see how much worse it gets
studies on perception - below 100ms feels instant
100-300 start to see delay
300ms - 1s - machine is working
1s+ mental context switch
over 10s - abandon task
users think your site is slower than it is
load time and time to interact has incresed in the last year
3rd party calls up to 50% of requests
audit 3rd party scripts
lots of talk about 4th, 5th party calls
spof-o-matic extension
# coreos
make things way more efficient by being opionated
no package manager on coreos
only run docker containers. No ruby / python running in os
no contract between os and you
read only user on base os
cant install new things under user
fork of chromanium? chrome os. Took from this auto updating
containers live on different mount point, doesnt touch coreos install
updates are automatic. Each machine (container?) reboots, using etcd as a lock
multiple nodes for a coreos cluster
each node runs docker, etcd
should focus on cluster, not node
flannel - allows each container to get an ip on the host
tun adapter, udp encapsulation
etcd centralised kv store
store cluster state, config about cluster (usernames / passwords / urls). also
service discovery
fleet is a scheduler, allows installs into the cluster
kubernetes is another scheduler
declarative deploys, tell scheduler 3 instances of app, scheduler figures it out
clones https://github.com/kelseyhightower/coreos-ops-tutorial
provision a dedicated etcd stack. Dont want etcd on nodes doing other works, bad
for RAFT protocol
creates a gce instance
uses 4 meg go containers - docker scratch containers, no base images
control node has a public and private ip
set up some ssh tunnels to comm with fleet etcd etc
prep cloud configs - similar to cloud-init? cloud-init in go
can use this to give metadata, configure host?
list systemd services that should be running
start etcd in a mode where it never becomes master on host
also start fleet and flannel
flannel sets up a bridge for docker, gives you 1st class IP addresses
enable systemd to log to a unix socket, for centralizing logs. coreos has no
syslog because systemd provides it
turn off updates for tutorial
run a sed command to give the ip of the host machne to config files in tutorial
repo
also tell flannel the subnet it is part of, by putting data in etcd
now setup 5 worker nodes
can pass an option to pass cloud config metadata to automate provisioning
can now list all the machines with fleet. fleet will ssh to control node, and
then list metadata of new nodes (because they registered with etcd on start?)
can list the subnets created in etcd
log into node1 in cluster
download a busybox docker container, so can run netcat
can now ssh into node 2, do the same thing, and communicate, thanks to flannel
used to demo that flanel is working
this communication is done without going through control host?
docker keeps track of ips it is handing out, docker is like a dhcp
carve of a chunk of subnet for flannel
logging in coreos - docker / systemd question
serve out systemd journal over http / syslog / json
demos shoving logs in logentries.com
simple go program to do this
use fleet to tell every node to run this service
start from scratch image which is empty tarball, add static go binary to create
container for logging
fleet metadata - Global=true, it needs to run on every node in cluster
fleet deals with brining up new nodes
grab the token for legentries from etcd
start the service - fleet start (service control file)
now when we list all units, see journal service running on all machines
talk to systemd journal, can aggregate logs
if the service fails, fleet starts it again, so if node dies, service started
somewhere else
now how do i install debugging tools? I need tcpdump
create a toolbox container - bash script to grab container, pull it apart, run
it with systemd-nspawn
set this contiainer as your login shell, every time you ssh in to one of the
nodes it runs this container
this runs as a privileged container, so has access to all network interfaces
(including the container running on control machine) - in same namespace as host
so we can tcpdump them
container is just a process, not a vm. just using cgroups / namespaces
kubernetes - dont use in production
kubernetes more featureful than fleet, but can use fleet to install it
kubernetes doesnt expose docker
pod concept? - https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/pods.md
every container should have a 1st class network - hence flannel
kubelet service talks to docker
only run this on workers / nodes , not control machine
kube-proxy - service discovery match port 80 to all of pods matching that, and
round-robin route to it. Dont have to use this - have used nginx before
kube-apiserver - just run on control node
fleet schedules services to run on control node as well as workers
kube-controller - replication
fleet metadata - say be on whichever server api is on - create a grouping of
services
controller-manager / scheduler also follows api-server
scheduler also plugablle. keyspace of scheduler. manages writing to etcd to tell
nodes what services should be running
kube-register - if machine healthy, and matches metadata, register with master
server
list minions - kubernetes concept of nodes
etcd is chubby! (google paper / project like etcd)
replication controller like ec2 autoscaling group, how many pods do you want to
run?
local memcache instance for each app - multiple containers within pod
example only uses 1 container in the pod
declarative system - say state you want for replication controller
just start the pod, it ends up on 1 particular host
kubernetes spins up a network container. everything in pod shares network
namespace / resources
very easy to horizontally scale - edit number of replicas in json file
now have 4 containers running, but cant access them - service proxy deals with
this
port 80 belongs to the hello service. any request comes in for 80, find
containers matching the metadata in service config
service proxies watching etcd, so straight after adding config machines are
listening on 80
rolling releases by creating a new track for replication controller containers, with new version of
code
canary has same port number as original release, so will rotate between them, as
both services are running
once happy, do a rolling upgrade
could just delete canary, then roll out new stable
or just update stable replication controller
firt configure etcd to ref new image
then rolling restart of pods
kubernetes just does this when it sees etcd change?
stateful containers are hard
multiple clusters - stateful cluster with data + stateless
can pin services to a host
cant do live migrations
if machine goes down, reschedule the unit
fleet has an api
dont run etcd across multiple dcs
separate clusters
data for etcd in github repo, so can replay it
https://coreos.com/docs/cluster-management/setup/cluster-architectures/
could use consul instead of etcd if you want
consul vs etcd like chef vs puppet
etcd based on chubby - very simple with external tools (more unixy?)
consul has loads more things built in - healtchecks, service discovery, dns
flannel + kubernetes built on top of etcd
adding new nodes just rebalances everything
flanel uses udp to get through first hop
next step is vxlan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment