mark shuttleworth, juju

q&a

security?
- same issue as docker images from dockerhub
- http://twitter.com/nixgeek/status/694103481909649409

mitchellh, vault

what is it?

secret management
- “secret” vs “sensitive”
- secrets: creds, certs, keys, passwords
- sensitive: phone numbers, mother’s maiden name, emails, dc locations, PII
- we think correctly about secrets
- maybe not so much with sensitive data
my definition: “anything which makes the news”
- this includes “secrets” and “sensitive” categories above
vault is designed for this case
certificates
- specific type of secret
- backed by RFCs
- used almost universally
- historically a pain to manage

the current state

unencrypted secrets in files
- fs perms
encrypted secrets with the key on the same fs
- eg chef databags
also, how does the secret get on to the system?
why don’t we use config management for secrets? because config management tools don’t have features we need:
- no access control
- no auditing
- no revocation
- no key rolling
why not (online) databases?
- rdbms, consul, zookeeper, etc
  - (one of the big motivations to create vault was we didn’t want people using consul to store secrets)
- not designed for secrets
- limited access controls
- typically plaintext storage
  - (even if the filesystem is encrypted, which has its own issues)
- no auditing or revocation abilities
how to handle secret sprawl?
- secret material is distributed
  - don’t want one big secret with all the access to everything
- who has acces?
- when were they used?
- what is the attack surface?
- in the event of a compromise, what happened? what was leaked?
  - can we design a system that allows us to audit what was compromised?
how to handle certs?
- openssl command line (ugh)
- where do you store the keys?
- if you have an internal CA, how do you manage that?
- how do you manage CRLs?

the glorious future

vault goals:
- single source for secrets, certificates
- programmatic application access (automated)
- operator access (manual)
- practical security
- modern data center friendly
  - (private or cloud, commodity hardware, highly available, etc)
vault features
- secure secret storage
- full cert mgmt
- dynamic secrets
- leasing, renewal
- auditing
- acls
secure secret storage
- encrypt data in transit and rest
- rest: 256bit AES in GCM
- TLS 1.2 for clients
- no HSM required (though you can if you want)
inspired by unix filesystem
- mount points, paths

$ vault write secret/foo bar=bacon
success!
$ vault read secret/foo
Key             Value
lease_id        ...
lease_duration  ...
lease_renewable false
bar             bacon

dynamic secrets
- never provide “root” creds to clients
- secrets made on request per client

$ vault mount postgres
$ vault write postgresql/config/connection value=...
# written once, can never be read back

$ vault read postgresql/creds/production
-> get back a freshly-created user
-> if you don't come back to vault within an hour, vault will drop the user

auditing
- pluggable audit backends
- request and response loggin
- prioritizes safety over availability
- secrets hashed in audits (salted HMAC)
  - searchable but not reversible
rich ACLs
flexible auth
- pluggable backends
- machine-oriented vs operator-oriented
- tokens, github, appid, user/pass, TLS cert
- separate authentication from authorization
high availability
- leader election
- active/standby
- automatic failover
- (depending on backend: consul, etcd, zookeeper)
“unsealing the vault”
- data in vault is encrypted
- vault requires encryption key, doesn’t store it
- must be provided to unseal the vault
- must be entered on every vault restart
- turtles problem!
  - this secret is sometimes on a piece of paper in a physical safe
- alternative: shamir’s secret sharing
  - we split the key
  - a number of operators have a share of the key
  - N shares, T required to recompute master
  - default: N:5 T:3

vault in practice

ways to access vault
- http + TLS API, JSON
- CLI
- consul-template (for software you won’t rewrite to talk directly to vault)
application integration
- the best way to access vault
- vault-aware
- native client libraries
- secrets only in memory
- safest but high-touch
consul-template
- secrets templatized into application configuration
- vault is transparent
- lease management is automatic
- non-secret configuration still via consul
- (then put your secrets on to a ramdisk and make sure the ramdisk can’t be swapped)
PII
- it’s everywhere
- “transit” backend for vault
- encrypt/decrypt data in transit
- avoid secret management in client apps
- builds on vault foundation
- web server has no encryption keys
- requires two-factor compromise (vault + datastore)
- decouples storage from encryption and access control

$ vault write -f transit/keys/foo
$ vault read -f transit/keys/foo
-> don't key key, just metadata about key
to send:
$ echo msg | vault write transit/encrypt/foo plaintext=-

to recv:
$ vault write transit/decrypt/foo ciphertext=-

rich libraries to do this automatically: vault-rails
disadvantage: everything round-trips through vault
- can increase performance with another trick (…?)
CA
- vault acts as internal CA
- vault stores root CA keys
- dynamic secrets - generates signed TLS keys
- no more tears
mutual TLS

$ vault mount pki
$ vault write pki/root/generate/internal common_name=myvault.com ttl=87600h
...
...

vault documents their security and threat models! https://www.vaultproject.io/docs/internals/security.html
certificates with vault
- can be revoked
- never exposes your CA private keys
- can manage intermediate CAs
- secured via ACLs like anything else
- audited like anything else

Q&A

do you support letsencrypt certs?
- just started planning this morning
if I use vault with consul, can I use that consul for something else?
- in theory yes
- vault uses a subpath within consul, and encrypts everything
- we recommend you use an ACL to prevent access to other apps (it’s all encrypted garbage but if someone flips a bit you lose everything)
what if there’s a rogue operator who’s keylogged?
- if you’re running it the right way it should be okay
- if the operator doesn’t have root on the machine running vault it should be okay
- if the operator has root then they can just coredump the vault process and get the key that way
- a rogue root user is not in our threat model

ignites

centos config mgmt sig, julien pivotto

puppet chef ansible salt juju
deeply linked with the OS, from the start, until EOL

where do you get tools?

vendors, epel, make install, gems etc

YAG

regular/commodity users -> EPEL
(gap)
advanced users

vendors packages

where is the srpm?
where is the buildchain?

we depend on those tools

they have bugs

CentOS everything

public build system
everything needed to build software

SIG

software interest group
topic-focussed
release RPMs
can we make a CFGMGMT SIG?
objectives:
- recent versions of cfg mgmt tools

a post-CM infrastructure delivery pipeline: @beddari

we were using CM but not winning
what we had built with love
- automated tests
- monitoring
but it was a total failure
- non-managable rebuild times
- envs were starting to leak
our systems are “eventually repeatable”
- darn it, test that change in prod
docker docker docker docker
solution: stop doing configuration management!
artifacts and pipelines
inputs are typically managed artifacts
change
feed input to packer which in turn runs a builder that applies change producing output
output: a versioned artifact (rpm)
- repos, packages, images, containers
abstraction is key to doing changes
defns:
- a input-change-output chain is a project
- a project is versioned in git
- all artifacts are testable
your new job is: describing state to produce artifacts and keeping that state from drifting
http://nubis-docs.readthedocs.org/en/latest/MANIFESTO/
change from stateful VMs to managing artifacts
- this worked really well
- packer with masterless puppet
- terraform and ansible
- masterless puppet to audit and correct drift
- yum upgrade considered harmful

SSHave your puppets! - Thomas Gelf

own the tools you run
replace a bunch of shell scripts
puppet infrastructure setup sucks
- tried to scale PuppetDB or PE?
puppet kick (deprecated)
mcollective is moribund
SSH - why not?
- centralized secrets

ssh <node> -c 'facter -j'
puppet master --compile > catalog
ssh <node> -c 'puppet apply'

problem: file server
shipping files with ssh?
- fix the catalog on the fly – change puppet:// urls
problem: pluginsync
- custom facts
reports, exported resources
- they just work
one node per catalog - why?
- I want partial catalogs, different schedules, different user accounts
tasks on demand
testing
docker docker

stop sharing “war stories” @felis_rex

“war stories”
“in the trenches”
how is this so much of a thing?
pop culture depictions of war v common
mainstream media depictions of war
yes, our jobs can be stressful
- we can be under pressure
- but it’s still not an actual war situation
this is about culture
- we’re a corner of OSS community
- it’s not about making great code
- the way we talk to each other matters
- we should be inclusive
alternatives to “war story”?
- anecdote
- story
- experience report

a pythonic approach to continuous delivery

I have working python code, how do I start now?
a proper deployment artifact:
- python package
  - (debian package? docker?)
it should be uniquely versioned
it should manage dependencies
https://github.com/blue-yonder/pyscaffold
CI:
- run tests, build package
push to artifact repository
http://doc.devpi.net
automated deploy: ansible
- we use virtualenvs to isolate dependencies
pip doesn’t do true dependency resolution
maintain and refactor your deployment
pypa/pip issue #988 (pip needs a dependency resolver)
OS package managers v pip: the two worlds should unite
pip is still optimized for a manual workflow (eg no –yes option)
you can build your own CD pipeline!

felt - samuel vandamme

front-end load testing
browser-based load testing tool with scenarios
history:
- looked at Squish - selenium alternative
  - uses chrome
- looked at phantomjs
  - headless so it’s good
  - missed some APIs
- looked at SlimerJS
  - not headless but still performant
use cases for felt
- quick load test
- FE apps (eg angularJS)
- simulate user

essential application management with tiny puppet @alvagante

cfgmgmt not just about files
but also files
puppet module: example42/tp
http://tiny-puppet.com/
tp::conf { ‘nginx’: … }
tp::conf { ‘nginx::example.com.conf’: … }
tp::dir { ‘nginx::www.example42.com: … }
tp::test { ‘redis’: }
tp::install { ‘redis’: …}

moving to puppet 4 at spotify

history:

puppet 2.7
- dark, hacky features (eg dynamic scoping)
puppet 3
- functional insanity with some pretty cool new tools and toys
- rspec-puppet, librarian-puppet etc came along
- we upped our game
puppet 4
- language spec!
- type system!
- lambdas
- iterations
- all the things
- sanity
- ponies!
as a module maintainer, it’s painful
- maintaining compatibility with 3 and 4 is frustrating

how we did it

step 1: breathe
- talk it through on your team
step 2: get to puppet 3.8 first
- the last 3.x release
- it starts throwing deprecation warnings at you
  - fix these
  - scoping, templates, etc
- upgrade your modules
  - vox ppupli and puppetlabs modules Just Work™
step 3: enable the future parser
- (don’t do this on puppet < 3.7.4)
- types of defaults will matter
  - eg where default is empty string but you can pass in an array
  - that won’t work any more
- clustered node data
  - $facts hash
    - unshadowable, unmodifiable
    - unlike the old $operating_system fact lookup style
step 4: upgrade to puppet 4
- two options:
  - distro packages
  - move to puppetlabs’s omnibus packages
- recommend using omnibus, but changes some things:
  - /var/lib/puppet moves to /opt/puppetlabs/puppet
step 5: caek

the actual upgrade

two choices for the actual upgrade:
- spin up a new master
  - point agents to the new master and babysit them one at a time
- pre-compile and compare catalogs
  - tools that help:
  - less common approach, but tends to go nicer

results

we did it in:
- 1-2 weeks of prep
- 1 week of rollout
- 2-3 days of cleanup
- 0 production incidents
- (over 10k nodes)
but.. we cheated
- we migrated to the future parser over a year ago :)

vox pupuli

60 modules & tooling
50 contributors
- basically everyone has commit to everything
join the revolution!

Q&A

bob: how many things did break when you enabled futureparser?
- weird scoping with templates

etcd, Jonathan Boulle @baronboulle

what is etcd?

name: /etc distributed
a clustered key-value store
- GET and SET ops
a building block for higher order systems
- primitives for distributed systems
  - distributed locks
  - distributed scheduling

history of etcd

2013.8: alpha (v0.x)
2015.2: stable (v2.0+)
- stable replication engine (new Raft impl)
- stable v2 API
2016.? (v3.0+)
- efficient, powerful API
  - some operations we wanted to support couldn’t be done in the existing API
- highly scalable backend
- (ed: what does this mean?)

etcd today

production ready

why did we build etcd?

coreos mission: “secure the internet”
updating servers = rebooting servers
move towards app container paradigm
need a:
- shared config store (for service discovery)
- distributed lock manager (to coordinate reboots)
existing solutions were inflexible
- (zookeeper undocumented binary API – expected to use C bindings)
- difficult to configure

why use etcd?

highly available
highly reliable
strong consistency
simple, fast http API

how does it work?

raft
- using a replicated log to model a state machine
- “In Search of an Understandable Consensus Algorithm” (Ongaro, 2014)
  - response to paxos
  - (zookeeper had its own consensus algorithm)
  - raft is meant to be easier to understand and test
three key concepts:
- leaders
- elections
- terms
the cluster elects a leader for every term
all log appends (…)
implementation
- written in go, statically linked
- /bin/etcd
  - daemon
  - 2379 (client requests/HTTP + JSON api)
  - 2380 (p2p/HTPP + protobuf)
- /bin/etcdctl
  - CLI
- net/http, encoding/json

etcd cluster basics

eg: have 5 nodes
- can lose 2
- lose 3, lose quorum -> cluster unavailable
prefer odd-numbers for cluster sizes
the more nodes you have, the more failures you can tolerate
- but the lower throughput becomes because every operation needs to hit a majority of nodes

simple HTTP API (v2)

GET /v2/keys/foo
GET /v2/keys/foo?wait=true
- poll for changes, receive notifications
PUT /v2/keys/foo -d value=bar
DELETE /v2/keys/foo
PUT /v2/keys/foo?prevValue=bar -d value=ok
- atomic compare-and-swap

etcd applications

locksmith
- cluster wide reboot lock - “semaphore for reboots”
- CoreOS updates happen automatically
  - prevent all machines restarting at once
- set key: Sem=1
  - take a ticket by CASing and decrementing the number
  - release by CASing and incrementing
flannel
- virtual overlay network
  - provide a subnet to each host
  - handle all routing
- uses etcd to store network configuration, allocated subnets, etc
skydns
- service discovery and DNS server
- backed by etcd for all configuration and records
vulcand
- “programmatic, extendable proxy for microservices”
- HTTP load balancer
- config in etcd
- (though actual proxied requests don’t touch etcd)
confd
- simple config templating
- for “dumb” applications
- watch etcd for chagnes, render templates with new values, reload
- (sounds like consul-template mentioned in the vault talk?)

scaling etcd

recent improvements (v2)
- asynchronous snapshotting
  - append-only log-based system
    - grows indefinitely
  - snapshot, purge log
  - safest: stop-the-world while you do this
    - this is problematic because it blocks all writes
  - now: in-memory copy, write copy to disk
    - can continue serving while you purge the copy
- raft pipelining
  - raft is based around a series of RPCs (eg AppendEntry)
  - etcd previously used synchronous RPCs
  - send next message only after receiving previous response
  - now: optimistically send series of messages without waiting for replies
  - (can these messages be reordered?)
future improvements (v3)
- “scaling etcd to thousands of nodes”
- efficient and powerful API
  - flat binary key space
  - multi-object transaction
    - extends CAS to allow conditions on multiple keys
  - native leasing API
  - native locking API
  - gRPC (HTTP2 + protobuf)
    - multiple streams sharing a single tcp connection
    - compacted encoding format
- disk-backed storage
  - historically: everything had to fit in memory
  - keep cold historical data on disk
  - keep hot data in memory
  - support “entire history” watches
  - user-facing compaction API
- incremental snapshots
  - only save the delta instead of the full data set
  - less I/O and CPU cost per snapshot
  - no bursty resource usage, more stable performance
- upstream recipes for common usage patterns
  - leases: attaching ownership to keys
  - leader election
  - locking resources
  - client library to support these higher level use cases

War Games: flight training for devops, Jorge Salamero Sanz @bencerillo

server density: monitoring
the cost of uptime
expect downtime
- prepare
- respond
- postmortem
incident example:
- power failure to half our servers
  - primary generator failed
  - backup generator failed
  - UPS failed
- automated failover unavailable
  - (known failure condition)
- manual DNS switch required
- expected impact: 20 minutes
- actual impact: 43 minutes
human factor
- unfamiliarity with process
- pressure of time sensitive event (panic)
- escalation introduces delays
documented procedures
- checklists! ✓
- not to follow blindly – knowledge and experience still valuable
- independent system
- searchable
- list of known issues and documented workarounds/fixes
checklists – why?
- humans have limitations
  - memory and attention
  - complexity
  - stress and fatigue
  - ego
- pilots, doctors, divers:
  - Bruce Willis Ruins All Films
- checklists help humans
  - increase confidency
  - reduce panic
realistic scenarios for your game day
- replica environment
- or mock command line
- record actions and timing
- multiple failures
- unexpected results
simluartion goals
- team and individual test of response
- run real commands
- training the people
- training the procedures
- training the tools
postmortem
- failure sucks
  - but it happens, and we should recognize this
- fearless, blameless
- significant learning
- restores confidence
- increases credibility
- timing
  - short regular updates
  - even “we’re still looking in to it”
  - ~1 week to publish full version
    - follow-up incidents
    - check with 3rd party providers
    - timeline for required changes
- content
  - root cause
  - turn of events which led to failure
  - steps to identify & isolate the cause

Empowering developers to deploy their own data stores with Terraform and puppet, @bobtfish

http://www.slideshare.net/bobtfish/empowering-developers-to-deploy-their-own-data-stores
https://github.com/bobtfish/AWSnycast
puppet data in modules
- this is amazing. it changed our lives
- apply regex to hostname search1-reviews-uswest1aprod to parse out cluster name
- elasticsearch_cluster { 'reviews': }
- developers can create a new cluster by writing a yaml file
- pull the data out of the puppet hierarchy
- resuse the same YAML for service discovery and provisioning
puppet ENC - external node classifier
- a script called by puppetmaster to generate node definition
- our ENC looks at AWS tags
  - cluster name, role, etc
- puppet::role::elasticsearch_cluster => cluster_name = reviews
- stop needing individual hostnames!
  - host naming schemes are evil!
    - silly naming schemes (themed on planets)
    - “sensible” naming schemes (based on descriptive role)
      - do you identify mysql master in hostname?
      - what happens when you failover?
- customize your monitoring system to actually tell you what’s wrong
  - “the master db has crashed” v “a db has crashed”
terraform has most of the pieces
- it’s awesome
  - as long as you don’t use it like puppet
  - roles/profiles => sadness
- treat it as a low level abstraction
- keep things in composable units
- add enough workflow to not run with scissors
- don’t put logic in your terraform code
- it’s a sharp tool
  - can easily trash everything
- it’s the most generic abstraction possible
- map JSON (HCL) DSL => CRUD APIs
  - it will do anything
  - as a joke I wrote a softlayer terraform provider which used twilio to phone a man and request a server to be provisioned
- cannot do implicit mapping
  - but puppet/ansible/whatever can?
  - “Name” tag => namevar
  - Only works in some cases - not everything has tags!
- implicit mapping is evil
  - eg: puppet AWS
  - in March 2014, I wanted to automate EC2 provisioning
    - I could write a type and provider in puppet to generate VPCs
    - @garethr stole it and it’s now puppet AWS
  - BUG - prefetch method eats exceptions (fixed now)
    - you ask AWS for all VPCs up front (in prefetch)
    - if you throw an exception while prefetching, it was silently swallowed
    - so it looks like there are no VPCs
    - now you generate a whole bunch of duplicates
    - workaround: an exception class with an overridden to_s method which would kill -9 itself
      - works, but not pretty
    - I wouldn’t recommend puppet-aws unless you’re on puppet 4 which fixed this bug
terraform modules
- reusable abstraction (in theory)
- sharp edges abound if you have deep modules or complicated modules
  - these are bugs and will be fixed
  - you can’t treat terraform like puppet
- use modules, but don’t nest modules
  - use version tags
  - use other git repos
    - split modules into git repos
state
- why even is state?
- how do you cope with state?
  - use hashicorp/atlas
    - it will run terraform for you
    - it solves these problem
  - we.. reinvented atlas
    - workflow (locking!) is your problem
    - if two people run terraform concurrently, you’ll have a bad time
    - state will diverge
    - merging is not fun
- split state up by team
  - search team owns search statefile
- S3 store
  - many read, few write
- wrap it yourself (make, jenkins, etc)
  - don’t install terraform in $PATH
    - you don’t want people running terraform willy-nilly
jenkins to own the workflow
- force people to generate a plan and okay it
  - people aren’t evil, but they will take shortcuts
  - if they can just run terraform apply without planning first, they will do so
  - protecting me from myself
- “awsadmin” machine + IAM Role as slave
- Makefile based workflow
- jenkins job builder to template things
  - you shouldn’t have shell scripts typed in to the jenkins text boxes
- split up the steps
  - refresh state, and upload the refreshed state to S3
  - plan + save as an artefact
  - filter plan!
    - things in AWS that terraform doesn’t know about
    - lambda functions which tag instances based on who created them
    - terraform doesn’t know these tags, so will remove them
    - we filter this stuff out
  - approve plan
  - apply plan, save state
nirvana
- self service cluster provisioning
  - devs define their own clusters
  - 1 click from ops to approve
- owning team gets accounted
  - aws metadata added as needed
  - all metadata validated
- clusters built around best practices
  - and when we update best practices, clusters get updated to match
- can abstract further in future
- opportunities to do clever things around accounting
  - dev requested m4.xlarges, but we have m4.2xlarges as reserved instances

inspec: skynet testing, @arlimus

slides: http://ow.ly/XPkvT
InSpec: Infrastructure Specification
- v similar to server-spec
- started on top of the server-spec project
code breaks
- normal accident theory
why?
- reduce number of defects
- security and compliance testing
test any target
- bare metal / VMs / containers
- linux / windows / other / legacy
tiny howto
- install from rubygems, or clone git repo
- (see slides)
- test local node
- test remote via ssh
  - (no ruby / agent on node)
- test remote via winRM (still no agent)
- test docker container
example test

describe package('wget') do
  it { should be_installed }
end

describe file('/fetch-all.sh') do
  it { should be_file }
  its('owner') { should eq 'root' }
  its('mode') { should eq 0640 }
end

inspec exec dtest.rb -t docker://f02e

run via test-kitchen
- kitchen-inspec
demo: solaris box running within test kitchen (on vagrant)
- test-kitchen verify normally takes a long time because it installs a bunch of stuff on the box
- much faster to verify with inspec
- solaris test:

describe os do
  it { should be_solaris }
end

describe package('network/ssh') do
  it { should be_installed }
end

describe service('/network/ssh') do
  it { should be_running }
end

being expressive

describe file('/etc/ssh/sshd_config') do
  its(:content) { should match /Protocol 2/ }
end

this regex is brittle. comment? prefix/suffix?

Better:

describe sshd_config do
  its('Protocol') { should cmp 2 }
end

custom resources help with this.

Configuration Management vs. Container Automation, @johscheuer and Arnold Bechtoldt

“containers suck too”
- “docker security is a mess!”
  - physical separation?
- “images on docuker hub are insecure!”
  - just community contributions
    - lots of docker images contain bash with shellshock vulnerability
  - docker images are artefacks, treat them like vmdk/vhd/vdi/deb/rpm
  - build your own lightweight base images
  - use base images without lots of userland tools if possible (eg alpine linux)
- dockerfile is “return of the bash”
  - over-engineered dockerfiles
  - replace large shell scripts with CM running outside the container
  - what we want is configuration management with a smaller footprint
  - avoid requiring ruby/python/etc inside the container just to get your CM tool running
- scheduling/orchestration is a whole new area
http://rexify.org - a perl-based CM tool
“it doesn’t matter how many resources you have, if you don’t know how to use them, it will never be enough”
use cases
- continuous integration & delivery

automating AIX with chef, Julian Dunn

AIX was first released in 1987 (?)
I first came in as an engineering manager, but I knew nothing about this platform
some quirks, some pains
Test Kitchen support – rough and unreleased
traditional management of AIX:
- manually
- SMIT - menu-driven config tool
- transforming old-school shops has two routes:
  - migrate AIX to linux, then automate with chef
  - manage AIX with chef, then migrate to linux
    - second route is easier as it abstracts away the OS so there’s less to learn at each step
challenges
- lack of familiar with platform, hardware, setup
  - hypervisor-based but all in hardware
- XLC - proprietary compiler
  - if you use XLC, output is guaranteed forward-compatible forever
  - binaries from 1989 still run on AIX today
- can’t use GNU-isms
  - no bash, it’s korn shell
  - no less!
- no real package manager
  - bff
- you can use rpm
  - but no yum or anything
- two init systems (init and SRC)
  - key features which are missing!
- virtualization features
  - sometimes cool, sometimes not
platform quirks & features
- all core chef resources work out of the box on AIX
- special resources in core
  - bff_package
  - service - need to specify init or SRC, some actions don’t work
- more specific AIX resources in aix library cookbook
  - manage inittab etc
chef’s installer is sh-compatible which is necessary for AIX
- the file at https://www.chef.io/chef/install.sh doesn’t use bash-isms
- starts with this comment:

# WARNING: REQUIRES /bin/sh
#
# - must run on /bin/sh on solaris 9
# - must run on /bin/sh on AIX 6.x
#

future work
- other POWER platform support
- chef server on POWER
- chef client for linux on System/z
links
- https://supermarket.chef.io/cookbooks/aix

rkt and Kubernetes: What’s new with Container Runtimes and Orchestration, Jonathan Boulle @baronboulle

appc pods ≅ kubernetes pods
rkt
- simple cli tool
- no (mandatory) daemon
  - big daemon running as root feels like not the best default setup
- no (mandatory) API
- bash/systemd/kubelet -> rkt run -> application(s)
stage0 (rkt binary)
- primary interface to rkt
- discover, fetch, manage app images
- set up pod filesystems
- manage pod lifecycle
  - rkt run
  - rkt image list
stage1 (swappable execution engines)
- default impl
  - systemd-nspawn+systemd
  - linux namespaces + cgroups
- kvm impl
  - based on lkvm+systemd
  - hw virtualization for isolation
- others?
rkt TPM measurement
- used to “measure” system state
- historically just use to verify bootloader/OS
- CoreOS added support to GRUB
- rkt can now record information about running pods in the TPM
- tamper-proof audit log
rkt API service
- optional gRPC-based API daemon
- exposes information on pods and image
- runs as unprivileged user
- read-only
- easier integration
recap: why rkt?
- secure, standards, composable
rktnetes
- using rkt as the kubelet’s container runtime
- a pod-native runtime
- first-class integration with systemd hosts
- self-contained pods process model -> no SPOF
- multiple-image compatibility (eg docker 2aci)
- transparently swappable container engines
possible topologies
- kubelet -> systemd -> rkt pod
- could remove systemd and run pod directly on kubelet (kubelet -> rkt pod)
using rkt to run kubernetes
- kubernetes components are largely self-hosting, but not entirely
- need a way to bootstrap kubelet on the host
- on coreos, this means in a container (because that’s the only way to run things on coreos)..
- ..but kubelet has some unique reuirements
  - like mounting volumes on the host
- rkt “fly” feature (new in rkt 0.15.0)
  - unlike rkt run, doesn’t run in pod; uncontained
  - has full access to host mount (and pid..) namespace
rkt networking
- plugin-based
- IP(s) per pod
- container networking interface (CNI)
- CNI was just another plugin type, but soon to be the kubernetes plugin model
- http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
  - aside: use letsencrypt, please, blog.kubernetes.io!
future
- rkt v1.0.0
  - soon....
- rktnetes 1.0 2016Q1
  - fully supported, full feature parity, automated testing on coreos
- rktnetes 1.0+
  - lkvm backend by default
  - native support for ACIs
  - tectonic trusted computing
  - https://coreos.com/blog/coreos-trusted-computing.html
- kubelet upgrades
  - mixed-version clusters don’t always work
  - (eg api from 1.0.7 to 1.1.1: https://coreos.com/blog/coreos-trusted-computing.html )
  - solution: API-driven upgrades
summary
- use rkt
- use kubernetes
- get involved and help define future of app containers

Q&A

does use of KVM mean non-linux hosts can be run inside?
- currently no
image format for a registry?
- we don’t have a centralized registry
- we want to get away from that model
can rkt run docker images
- yes
- the current kubernetes api only accepts docker images so it’s the only thing it can run
what do I have to actually do to use rkt?
- there’s a using rkt with kubernetes guide

daks/cfgmgmtcamp2016.org

mark shuttleworth, juju

q&a

mitchellh, vault

what is it?

the current state

the glorious future

vault in practice

Q&A

ignites

centos config mgmt sig, julien pivotto

where do you get tools?

YAG

vendors packages

we depend on those tools

CentOS everything

SIG

a post-CM infrastructure delivery pipeline: @beddari

SSHave your puppets! - Thomas Gelf

stop sharing “war stories” @felis_rex

a pythonic approach to continuous delivery

felt - samuel vandamme

essential application management with tiny puppet @alvagante

moving to puppet 4 at spotify

history:

how we did it

the actual upgrade

results

vox pupuli

Q&A

etcd, Jonathan Boulle @baronboulle

what is etcd?

history of etcd

etcd today

why did we build etcd?

why use etcd?

how does it work?

etcd cluster basics

simple HTTP API (v2)

etcd applications

scaling etcd

War Games: flight training for devops, Jorge Salamero Sanz @bencerillo

Empowering developers to deploy their own data stores with Terraform and puppet, @bobtfish

inspec: skynet testing, @arlimus

being expressive

Configuration Management vs. Container Automation, @johscheuer and Arnold Bechtoldt

automating AIX with chef, Julian Dunn

rkt and Kubernetes: What’s new with Container Runtimes and Orchestration, Jonathan Boulle @baronboulle

Q&A