Created
November 19, 2014 16:29
-
-
Save timmow/3724eccedbea9178d943 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # linux containers from scratch | |
| different to virtual machines | |
| full os in container has some gotchas | |
| containers are transparent | |
| host os sees process from within container | |
| namespaces - chroot is a namespace | |
| namespace - processes | |
| same process can have multiple pids, one from within namespace, one without | |
| c(ontrol)groups - limit resources to a group, cpu / memory | |
| minimal container contains shell commands, systemd-nspawn starts the container | |
| ip netns - network namespaces | |
| veth - virtual ethernet | |
| ip netns exec - execute command in namespace | |
| aufs efficient for small files, as file based rather than block based | |
| aufs - 2 directories, combine by mounting into a "tmp" directory, changes to teh | |
| tmp directory are copied into the 2nd dir | |
| full os in container - not recommended | |
| fsck used as an example, get strange errors with container | |
| uts namespace | |
| # web perf | |
| set a target, improve on it | |
| 2-4s good target for page draw | |
| conversion vs landing page graph 2010 - 2012 is interesting - things have | |
| changed | |
| users less tolerant of slowness | |
| html delay - needs to be less than 200ms to not affect things | |
| should test this yourself by slowing your site down, see how much worse it gets | |
| studies on perception - below 100ms feels instant | |
| 100-300 start to see delay | |
| 300ms - 1s - machine is working | |
| 1s+ mental context switch | |
| over 10s - abandon task | |
| users think your site is slower than it is | |
| load time and time to interact has incresed in the last year | |
| 3rd party calls up to 50% of requests | |
| audit 3rd party scripts | |
| lots of talk about 4th, 5th party calls | |
| spof-o-matic extension | |
| # coreos | |
| make things way more efficient by being opionated | |
| no package manager on coreos | |
| only run docker containers. No ruby / python running in os | |
| no contract between os and you | |
| read only user on base os | |
| cant install new things under user | |
| fork of chromanium? chrome os. Took from this auto updating | |
| containers live on different mount point, doesnt touch coreos install | |
| updates are automatic. Each machine (container?) reboots, using etcd as a lock | |
| multiple nodes for a coreos cluster | |
| each node runs docker, etcd | |
| should focus on cluster, not node | |
| flannel - allows each container to get an ip on the host | |
| tun adapter, udp encapsulation | |
| etcd centralised kv store | |
| store cluster state, config about cluster (usernames / passwords / urls). also | |
| service discovery | |
| fleet is a scheduler, allows installs into the cluster | |
| kubernetes is another scheduler | |
| declarative deploys, tell scheduler 3 instances of app, scheduler figures it out | |
| clones https://github.com/kelseyhightower/coreos-ops-tutorial | |
| provision a dedicated etcd stack. Dont want etcd on nodes doing other works, bad | |
| for RAFT protocol | |
| creates a gce instance | |
| uses 4 meg go containers - docker scratch containers, no base images | |
| control node has a public and private ip | |
| set up some ssh tunnels to comm with fleet etcd etc | |
| prep cloud configs - similar to cloud-init? cloud-init in go | |
| can use this to give metadata, configure host? | |
| list systemd services that should be running | |
| start etcd in a mode where it never becomes master on host | |
| also start fleet and flannel | |
| flannel sets up a bridge for docker, gives you 1st class IP addresses | |
| enable systemd to log to a unix socket, for centralizing logs. coreos has no | |
| syslog because systemd provides it | |
| turn off updates for tutorial | |
| run a sed command to give the ip of the host machne to config files in tutorial | |
| repo | |
| also tell flannel the subnet it is part of, by putting data in etcd | |
| now setup 5 worker nodes | |
| can pass an option to pass cloud config metadata to automate provisioning | |
| can now list all the machines with fleet. fleet will ssh to control node, and | |
| then list metadata of new nodes (because they registered with etcd on start?) | |
| can list the subnets created in etcd | |
| log into node1 in cluster | |
| download a busybox docker container, so can run netcat | |
| can now ssh into node 2, do the same thing, and communicate, thanks to flannel | |
| used to demo that flanel is working | |
| this communication is done without going through control host? | |
| docker keeps track of ips it is handing out, docker is like a dhcp | |
| carve of a chunk of subnet for flannel | |
| logging in coreos - docker / systemd question | |
| serve out systemd journal over http / syslog / json | |
| demos shoving logs in logentries.com | |
| simple go program to do this | |
| use fleet to tell every node to run this service | |
| start from scratch image which is empty tarball, add static go binary to create | |
| container for logging | |
| fleet metadata - Global=true, it needs to run on every node in cluster | |
| fleet deals with brining up new nodes | |
| grab the token for legentries from etcd | |
| start the service - fleet start (service control file) | |
| now when we list all units, see journal service running on all machines | |
| talk to systemd journal, can aggregate logs | |
| if the service fails, fleet starts it again, so if node dies, service started | |
| somewhere else | |
| now how do i install debugging tools? I need tcpdump | |
| create a toolbox container - bash script to grab container, pull it apart, run | |
| it with systemd-nspawn | |
| set this contiainer as your login shell, every time you ssh in to one of the | |
| nodes it runs this container | |
| this runs as a privileged container, so has access to all network interfaces | |
| (including the container running on control machine) - in same namespace as host | |
| so we can tcpdump them | |
| container is just a process, not a vm. just using cgroups / namespaces | |
| kubernetes - dont use in production | |
| kubernetes more featureful than fleet, but can use fleet to install it | |
| kubernetes doesnt expose docker | |
| pod concept? - https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/pods.md | |
| every container should have a 1st class network - hence flannel | |
| kubelet service talks to docker | |
| only run this on workers / nodes , not control machine | |
| kube-proxy - service discovery match port 80 to all of pods matching that, and | |
| round-robin route to it. Dont have to use this - have used nginx before | |
| kube-apiserver - just run on control node | |
| fleet schedules services to run on control node as well as workers | |
| kube-controller - replication | |
| fleet metadata - say be on whichever server api is on - create a grouping of | |
| services | |
| controller-manager / scheduler also follows api-server | |
| scheduler also plugablle. keyspace of scheduler. manages writing to etcd to tell | |
| nodes what services should be running | |
| kube-register - if machine healthy, and matches metadata, register with master | |
| server | |
| list minions - kubernetes concept of nodes | |
| etcd is chubby! (google paper / project like etcd) | |
| replication controller like ec2 autoscaling group, how many pods do you want to | |
| run? | |
| local memcache instance for each app - multiple containers within pod | |
| example only uses 1 container in the pod | |
| declarative system - say state you want for replication controller | |
| just start the pod, it ends up on 1 particular host | |
| kubernetes spins up a network container. everything in pod shares network | |
| namespace / resources | |
| very easy to horizontally scale - edit number of replicas in json file | |
| now have 4 containers running, but cant access them - service proxy deals with | |
| this | |
| port 80 belongs to the hello service. any request comes in for 80, find | |
| containers matching the metadata in service config | |
| service proxies watching etcd, so straight after adding config machines are | |
| listening on 80 | |
| rolling releases by creating a new track for replication controller containers, with new version of | |
| code | |
| canary has same port number as original release, so will rotate between them, as | |
| both services are running | |
| once happy, do a rolling upgrade | |
| could just delete canary, then roll out new stable | |
| or just update stable replication controller | |
| firt configure etcd to ref new image | |
| then rolling restart of pods | |
| kubernetes just does this when it sees etcd change? | |
| stateful containers are hard | |
| multiple clusters - stateful cluster with data + stateless | |
| can pin services to a host | |
| cant do live migrations | |
| if machine goes down, reschedule the unit | |
| fleet has an api | |
| dont run etcd across multiple dcs | |
| separate clusters | |
| data for etcd in github repo, so can replay it | |
| https://coreos.com/docs/cluster-management/setup/cluster-architectures/ | |
| could use consul instead of etcd if you want | |
| consul vs etcd like chef vs puppet | |
| etcd based on chubby - very simple with external tools (more unixy?) | |
| consul has loads more things built in - healtchecks, service discovery, dns | |
| flannel + kubernetes built on top of etcd | |
| adding new nodes just rebalances everything | |
| flanel uses udp to get through first hop | |
| next step is vxlan |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment