- Chatham House Rule, so no attribution of ideas to people or companies
 
- bootstrapping environments (without object stores)
 - service discovery
 - removing spofs
 - modern monitoring – sensu, runbooks, dashboards
    
- tradeoff between ease of management and sophistication
 - elastic sites?
 
 - surviving DDoS attacks when your site is transactional
 - modern cmdbs
 - ansible
 - icinga re-acknowlegdement
    
- ie I know disk is critical at 10%, but please re-alert at 5%
 
 
- big infrastructure
    
- shared web servers
 - shared tomcat servers
 - zenoss over snmp
        
- snmp didn’t scale
 
 - problem: everything is averaged over 5 minutes
        
- teams are spinning up their own graphite instances to monitor their own stuff
 
 - zenoss required 40 boxes, I expected 2
 
 - what does graphite look like at scale?
    
- protip: buy fusion io
 - it can be hard to rebalance your metrics
        
- particularly if you’re using consistent hashing
 
 - carbonate for migrating data to another graphite server
        
- though you’ll probably end up with downtime
 
 
 - has anyone used skyline?
    
- we looked at it, but we got lots of false alerts
        
- my suspicion is that if we understood maths better, we could make it work really well
 
 
 - we looked at it, but we got lots of false alerts
        
 - in sensu-community-plugins, there’s a check-graphite
    
- it does nice things like exceeding N std deviations
 
 - what do people use below graphite?
    
- we’re using collectd
        
- the latest stuff has statsd and jmx connectors
 
 
 - we’re using collectd
        
 - anyone using ganglia?
    
- we’re replacing ganglia with sensu stuff and diamond
        
- why are using diamond rather than the in-built sensu stuff?
            
- because we’re a python shop
 
 - we push data over rabbitmq
 - and fan in to a big fat central fusionio graphite
 - how do you monitor rabbit?
            
- sensu monitors rabbit using rabbit
 - there are healthchecks which should fire if rabbit is completely broken
 - we have a cron job on every rabbit and every sensu server to
                kill the process every hour
                
- and it still works
 
 
 
 - why are using diamond rather than the in-built sensu stuff?
            
 
 - we’re replacing ganglia with sensu stuff and diamond
        
 - is anyone using riemann?
    
- is it worth spending time with?
 - where does it add value?
        
- real time anomaly detection
 - it does events as well as numbers
 - it also has events timeouts – it can notify on an absence of events as well as presence
 - I think you could replace statsd with riemann
 
 
 - does anyone store second or subsecond data for a long time?
 - we have a single biggest day each year
    
- we snapshot everything for that day - stats, logs, etc
 - use it to drive load testing for the next year
 
 - we’ve been trying redshift
 - how does elasticsearch cope with metrics?
    
- we push quite large documents about everything to do with a web request
 - I often find log data in kibana much more useful than the same data in graphite
 - does anyone use realtime queries to drive alerting from
        elasticsearch?
        
- yes, from graylog2
 
 
 - one thing we’ve done recently is tuning down the amount of io
    operations that carbon uses per second.  massively reduces disk
    usage
    
- or write to ram disk and sync once per minute
 
 - how do you get devs to make more metrics available?
    
- you put them on call until they do
 
 - do people cull metrics at all?
    
- i never have enough data
 
 - do people have app metrics measured by their continuous delivery
    pipeline?
    
- our apps publish an xml document which is a schema of the types of metrics that they can publish
 
 - if I don’t hate myself, is there anything other than sensu I
    should use for monitoring that environment?
    
- does anyone rely on cloudwatch?
        
- we use it as a source for some data (ELB metrics)
            
- you can get these delivered into S3 these days
 
 - but it only stores data for two weeks
 
 - we use it as a source for some data (ELB metrics)
            
 
 - does anyone rely on cloudwatch?
        
 - does anyone using sensu miss nagios tactical view?
    
- I miss having a decent dashboard
        
- I don’t miss the 10 different nagioses per environment
 - I don’t miss the failover when we lost the primary nagios instance and all the state in it
 
 - we wrote a dashboard to query nagios and sensu
 
 - I miss having a decent dashboard
        
 - from the internet peanut gallery: is anyone using circonus?
 
- my agenda:
    
- the presence of artefacts I don’t necessarily own
        
- large graphical images or video data
 - third party applications
 
 - I may wish to release the same artefact multiple times
        
- we’ll use oracle 11 everywhere at one patch level
            
- but in different configurations
 
 
 - we’ll use oracle 11 everywhere at one patch level
            
 - windows images (VDIs)
 
 - the presence of artefacts I don’t necessarily own
        
 - fpm is useful
    
- but it never generates a spec file or a source rpm
 - makes me uncomfortable
 
 - I’m not happy about rpms, because you can only have one version of
    one package installed at once
    
- eg a simple webapp where we don’t want to do the loadbalancer dance
 - that also implies the app is relocatable which vendor binaries often aren’t
 
 - is containerization part of the solution?
    
- it allows you to have multiple overlapping filesystems
 - a model: each customer has their own container
        
- we haven’t done it
 - that sounds very expensive
 
 - how do you version control containers?
        
- do you treat them as a single binary?
 - do you reconstruct it?
 
 - a lot of solutions assume all machines are stateless
        
- someone else will deal with the databases
 
 - containers allow you to minimize surprise
        
- a DBA logging into your container can find things where they expect, even if it’s from an underlying frankenstein filesystem
 
 - I don’t mind snapshots, but they should be generated mechanically and repeatably.
 
 - what tool would you love to exist in an ideal world?
    
- I’d like the deployment database to do effectively dependency
        injection
        
- I know where the dependencies are and what data I’m injecting, so I can use system monitoring to know what I’ve deployed
 
 
 - I’d like the deployment database to do effectively dependency
        injection
        
 
- HTTP isn’t the best protocol in the world
 - use queues!
 - refactoring and testing is a better solved problem within the
    python programming language than over the network
    
- I don’t think it’s hard to test µservices
        
- there are clear contracts
            
- that’s the theory, right?
 
 
 - there are clear contracts
            
 
 - I don’t think it’s hard to test µservices
        
 - we end up building lots of small monoliths and wiring together
 - we switched to using amazon SNS to manage notifications
 - how you get your ops team to support µservices is you get them to
    support as little as possible
    
- they only work when the functional team owns the whole stack right to the bottom
 
 - services have a life cycle
    
- we like building things
 - we should get better at killing things when they’re not using things
 
 - is there an additional cost to the organization for running
    µservices?
    
- is there an organizational cost to having a 2 million line codebase?
 
 - ownership of services
    
- handover of building team to ongoing running team
 - problems can get pushed back to the building team
 
 - antipattern around µservices:
    
- developers think they’re clever
 
 - ntp is a µservice
 - aren’t µservices and SOA the same thing?
    
- is it SOA done right?
 
 
- how do you deal with PRs?
    
- what about things that are not on your roadmap?
        
- by not having a very good roadmap?
 
 - or moving in directions you don’t want to go?
 - it can be awkward because people might have put a lot of work in
        
- but you need to explain “if you want to do that you need to fork it”
 - you can try to avoid it by writing a decent rationale of what you’re trying to do
 - though you can’t answer all the questions up front
 
 
 - what about things that are not on your roadmap?
        
 - you want to optimize for dragging people into your community
    
- as the implementer, your documentation is going to be awful
 - because you already understand the whole system and don’t understand when you’re assuming tacit knowledge
 - whereas if you can attract users to your irc channel, and answer their questions really clearly, they can write great docs for you
 - I try to have a policy of: if anything confuses you, here’s my email, twitter, irc, etc and I will try to help you
 - encourage people to raise bugs against docs
 - I come from the perl community
        
- there are 10-15 year old projects there where the maintainer has changed 4-6 times
 - have you got an example?
            
- Catalyst
                
- ~200 repos (core + plugins)
 - ~450 active committers
 
 
 - Catalyst
                
 
 
 - plugins are interesting: if people are trying to pull the project in different directions, you can let them through extensions but keep the core very small
 - does anyone have experience of running OSS projects at work?
    
- how do you manage time management?
        
- the important PRs to pay attention to are those from new
            contributors
            
- certainly get back within 24 hours
 - don’t necessarily have to merge
 
 
 - the important PRs to pay attention to are those from new
            contributors
            
 
 - how do you manage time management?
        
 - why are you open sourcing this code?
    
- to get the community using
 - to get good publicity
 
 - do you have an OSS landing page?
    
- yes, but it’s out of date
 
 - the OSS stuff that has mostly been infrastructure-related we’ve been trying to put into a separate github org
 - you imply some level of support here
    
- running an OSS project is more than just making code open
 - to be able to do that successfully, you need to at least mentally divest yourself from your parent organization
 
 - what do you do if that project isn’t your main focus?
    
- my OSS contributions are entirely selfish
 - you need a maintainer
        
- there needs to be clear communication channels
 
 
 - what does a maintainer do?
    
- is it always one person?
        
- no! not if you can avoid it?
 - once a project has a community it’s difficult for one person to maintain
 - even if you’re not writing code, managing the community can rapidly become a full-time job
 
 - what about the cost of maintenance?
        
- use travis!
 - but please review the contribution even if the contribution passes the tests
 
 - problem of selectivity, vision and direction
        
- mozilla in the early days, just accepted everything.
 - ended up having to rewrite as firebird (now firefox)
 
 
 - is it always one person?
        
 - how do you ensure governance doesn’t become onerous?
    
- example of people who forked their own project after it had become an apache project
 - example of gcc fork (egcs) which got merged back in
 
 - a lot comes back to documenting your original vision
    
- I’ve been added as a maintainer in places, and sometimes there’s clear advice and sometimes there isn’t.
 
 - if you open source a project that you don’t use is a recipe for
    abandonware.
    
- we also have an organization for abandoned code to move it out of our main github org
 
 - forks
    
- how do you transfer maintainership?
 - what happens if a project gets abandoned and then forked?
 
 - what are the good communication channels to have for an OSS
    project?
    
- own website for announcement and discovery?
        
- how do you summarize your project?
 - peeve: like <other project> but X
 
 - community of contributors comes from community of users
        
- so good user documentation will foster contributors
 
 - issues
        
- is it worth seeding the issues list even if we have an internal tracker?
 - yes, because it helps users google for error messages
 - they are effectively documentation
 - do you move to only use the external tracker or do you have an internal tracker too?
 
 - do you need a security contact?
        
- yes, with a GPG key
 
 - people need to see activity
        
- if all your activity is on your internal tracker & mailing list & private irc, people will think it’s dead
 
 - where do people host mailing lists?
        
- google groups
 
 - a few people are averse to irc
        
- people don’t realise that they won’t get an immediate response necessarily
 - irc shouldn’t be used alone
 - timezones are also an issue
 
 - ipython uses hangouts
 - gmane: a newsgroup view on your mailing list
 - don’t have a separate irc channel per project if you’re managing lots of projects
 
 - own website for announcement and discovery?
        
 - how do you host your docs?
    
- you should control your domain?
 - when is a README not enough?
 - start with github pages, and you can migrate later
 - what should it have?
        
- screenshots
 - getting started guides
 
 - github pages are a bad idea because you can’t version them
        
- readthedocs keeps old versions too
 
 - contributions must update docs when they update behaviour
 
 - documentation & communication is super super important
    
- careful with contributions from newbies
        
- rejecting a contribution because of lack of tests can be
            tricky
            
- they might not have written many tests in general
 - they might not understand your particular test framework
 
 - but rejecting because of no docs is more reasonable
 - you can write tests for them
            
- and use this as a communication channel
 - “does this test look like it’s measuring the thing you’re trying to build?”
 
 
 - rejecting a contribution because of lack of tests can be
            tricky
            
 
 - careful with contributions from newbies
        
 - how do you handle trolls, griefers and timewasters?
    
- one small doc patch earns you a hundred stupid questions
 - love your idiots
 
 
- what’s arrived? what’s died?
 - Big Data is now a thing people talk about
    
- you’re now seeing adverts on the tube about it
 
 - is couchdb dead?
    
- npm?
 - we still use it, but we only used it as a key-value store
 
 - still going:
    
- mongo
 - riak
 
 - websockets are now standardized and supported by lbs, proxies
 - edgeconf
    
- grunt and pig and oink and stuff
 - doing a js build and running tests
 - angularjs
 
 - ndoc has gone
 - flash is in its death throes
 - most video sites work on an ipad
 - webgl has taken hold
 - epic demoed unreal engine 4 in firefox
 - 60 fps on the web
 - docker!
    
- although solaris has been doing it for yonks
 
 - golang has taken off
    
- when did go hit 1.0?
 - people are rewriting individual bits in go (rather than everything)
 
 - is hacker news dead yet?
 - bitcoin happened
    
- VPS providers have been getting attacked for people trying to steal them
 - people trawling github to find access keys
 - bitcoin mining in the browser
 
 - erlang
    
- nobody’s started writing things in it
 - though there’s elixir
 - and julia
 - and idris
 
 - what’s falling out of favour?
    
- ruby? no
 - scala? no
 
 - facebook’s hack
    
- seems sensible if you’re already in a php environment
 
 - bittorrent
    
- an incredibly good way of saturating your network
 - though this isn’t new
 
 - µservices
    
- just due to containerization?
 - seems to be a bunch of ex-tw people
 
 - elasticsearch is now usable
    
- and quite good
 - and they acquired logstash and kibana
 
 - logs being searchable in es
    
- splunk has a reasonable oss competitor
 
 - graphite has grown
    
- there’s experimentation going on there
        
- storage backends (cassandra, leveldb)
 
 
 - there’s experimentation going on there
        
 - what about lucene?
    
- very few people use it directly these days
 
 - snowden
 - DC security
 - https everywhere
    
- gmail is now ssl only
 - PFS
 - the perception that TLS is expensive
 - spdy
 
 - webp
 - IE6 is on its deathbed
 - winxp
    
- though it’s still in cash terminals
 
 - mobile growth
    
- many sites are on the edge for 50% mobile
 - talk of mobile first and now mobile only
 
 - 4G
 - bootstrap
 - wearables & IoT
    
- fitbit
 - pebble
 - automotive
        
- tesla motors
 
 
 - security updates
    
- wordpress now has autoupdate
 
 - nagios isn’t dead yet
    
- sensu is still the hot new thing
 - riemann
 - flapjack
 
 - desktops are going away
    
- except for gaming
 
 - centos is now owned by redhat
 - linux mint?
 - systemd
 - ubuntu as a server is now more probably
    
- is upstart going away?
 
 - postgres got built-in replication
 - graph dbs (neo4j)
 - paas
    
- people are still excited
 - it got even more complicated to install your own
 
 - where’s node going?
 - streaming extensions
    
- rx in .NET
 - rise of functional
 
 - linux on the desktop?
    
- the XPS13 is good
 - the rise of chromebooks
 
 - openstack?
    
- everyone thinks it’s a great idea
 
 - private clouds?
    
- azure will sell you an on-premise cloud thing
 - what’s the difference between an in-house cloud and a data centre?
 
 - drones, quadcopters, hexapods
    
- for filming
 
 - what’s coming up?  what will be important at the next scale
    summit?
    
- net security is in flux
 - forks of android will be the new linux distro
 - http 2
 - IPv6?
 - anomaly detection
 - software defined networks
 - containerization
 - silicon roundabout?
        
- it’s not a playground for children anymore
 - the adults have taken over
 
 - computing in government
        
- US has 18F
 - GDS
 
 - I’d like there to be a world-class home grown east london
        startup doing technically challenging stuff
        
- startups which solve technical problems don’t generally get funded
 - acquisitions
 
 - crowdfunding?
        
- noone cares
 
 
 - what’s going to die?
    
- couchdb
 - python 2 will not die
 
 
- how do we hire & train & new people into our industry?
 - we certainly have struggled to recruit
    
- we’ve come to the realization that part of the solution is hiring junior people & growing them into the role
 - I’ve been asked to mentor a junior person but I’ve no idea what to do
 
 - I’m a recent junior
    
- one on one time is quite good
 - I came in having a basic idea what I’d be doing
 - be open for questions
        
- the devops world is really overwhelming
 - it’s so useful to be able to ask things
 
 - that’s one of the ground rules we’ve agreed on
        
- ie that I’m interruptible
 
 - we’ve certainly noticed that hiring in the junior area is useful
 
 - it’s great having juniors because you get chaos monkeys as well
    
- if you’re not prepared to let a junior touch something, you probably need to make it more resilient
 
 - ETO1: 12-week night course
    
- teaches you how to teach
 
 - how do you get the theory?  how do you talk about underlying
    principles that are independent of the particular situation at
    hand?
    
- pair programming is really good for that
        
- does that depend on the teaching style of the pair?
 
 - make the junior document the things that you’re teaching them
        
- it helps ensure that they’ve understood it
 
 
 - pair programming is really good for that
        
 - I get irritated when technical people tweet complaining about the
    cost of interruptions
    
- when you have new people, you have to empower them to interrupt
 - I don’t think you should have your entire team mentor a new starter
 - we use the red flag system
        
- you put a red flag up if you don’t want to be interrupted
 
 - designated interruptible person
 - juniors also have a difficult time saying no
        
- you want to make everyone happy and be helpful
 
 - do you have a system that makes work visible?  eg kanban
        
- we have a helpdesk system
 - but external people don’t use it for smaller tasks
            
- raise a ticket on their behalf
 
 
 - how do we teach juniors that it’s ok to say no?
        
- also, how to understand what the requestor is trying to achieve, rather than the specific task they want done, and recognize when it’s the wrong fit?
 
 
 - juniors are way more engaged if they get a choice (however constrained) on what they get to spend their time on
 - also allow people to fail
    
- teach them that it’s okay to fail
 - I troll my junior developers sometimes
        
- I lead them down the garden path
 - but then I’m there to pick up the pieces when they fail
 
 - do something that’s visible to other people in the company
        
- so that they can show people what they’re capable of
 
 
 - how do you direct people through different areas of knowledge?
    
- do you go shallow on lots of tools? Or really deep on one thing?
 - depends on the junior
        
- throw things at them and see what sticks
 
 - go broad with the concepts early on
        
- architecture, system, etc
 
 
 - onboarding
    
- desk & computer should be ready
 - first week should be meeting all the people they need to know about
 - have monthly checkins with the mentor
        
- checkins, not reviews!
 
 - get a sales person to give a demo of whatever it is you build
 
 - can anyone recommend useful resources for managing developers?
    
- how to talk to your kids or something like that
 
 - how do you improve diversity?
    
- how do juniors find your roles?
 - you don’t have to stick to the same old networks when hiring juniors
 - thoughtbot – structured apprentice schemes
 - I wonder if being more explicit & realistic about what
        experience required and salaries are in job postings?
        
- recruiters muddy the waters a lot
 - go direct if you can
 
 
 - how do you know when to stop mentoring? and how do you measure success?
 
- docker + 150 lines of shell
 
- mirroring cpan, rubygems, npm
 - filesystems are good at serving things that look like files
 - you don’t need to use couch or
 - what was the easiest to mirror?
    
- cpan – it has a single line rsync command to create a mirror
 
 - wikipedia is hard to mirror
    
- each wikimedia site has a different set of plugins
 
 
- it’s important to have good search for your site
 - we use google analytics. you can use this to find click behaviour
    for particular search terms
    
- ie for term X, how often do people click on link 1, 2, 3, 4, etc
 
 - automate this!
 - crunch the most popular searches
 - identify how many clicks they got
 - use it to calculate how many more clicks we would have got if we had ordered the results better
 
- juju is a service orchestration tool
 
- apple, facebook employees hacked via website malware, java vulnerability
 - data in transit protection
 - data at rest protection
 - authentication
    
- user to device, user to service, device to service
 
 - secure boot
    
- firmware
 
 - platform integrity and app sandboxing
 - app whitelisting
    
- although key here is to ensure that whitelist doesn’t take too long to modify for new things
 
 - security policy
 - sounds like configuration management
 - external interface protection (firewalls)
 - device update policy
 - incident response
    
- things will go wrong
 
 - although don’t worry too much about this
    
- unless you have to.
 
 
- scale using libraries
 - a library has all the modularity properties that services have
 - except you don’t need to worry about the network going down
 
- august 29th for 3 days
 - go here