Created
November 21, 2017 03:08
-
-
Save mwatts15/c6b0d827bdbaeb2bc8b0e4e52e8f60b8 to your computer and use it in GitHub Desktop.
ADDO '17 notes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Mark Watts' notes on All Day DevOps. | |
==================================== | |
The talks I viewed and the notes I took reflect my personal interests and | |
issues which I though would be worthwile to my employer, but there were many | |
useful talks. | |
One of the business-side advantages of DevOps is that it's supposed to *save | |
time and money* by reducing overhead due, principally, to dysfunctional | |
patterns of behavior within the organization. Many talks in ADDO describe tools | |
that are intended to solve common problems in software development and | |
deployment. In addition, many presenters describe processes of organizational | |
transformation which give insight into possible difficulties and ways to get | |
through or avoid them. | |
As I see it, continuous integration (CI) and continous deployment (CD) are | |
*consequences* of an organization that has built up the habits of communication | |
between groups, which habits we label DevOps. Without communication that | |
happens early and often, we end up in situations where software or | |
infrastructure or IT use 'DevOps tools' but they don't realize the full | |
potential of those tools to add value and deliver more rapidly. There are notes | |
for several tools- and techniques-focused presentations below, but the | |
importance isn't the tools themselves but the models of product development, | |
deployment, and maintenance they enable through their use. | |
--- | |
Thanks to the ADDO organizers, presenters, and sponsors! | |
NOTE: ADDO has clipped the talks so you don't have to scrub through looking for | |
a talk like I had to, but the links for the originally recorded blocks are used | |
below since I didn't have the clips at the time. A patch with the updated links | |
is welcome, but I likely will not make the effort since you can find each talk | |
by its title or presenter name pretty easily. | |
Keynote, Jaya Baloo - 3:00 AM, EST | |
3:00 AM, EST | |
Derek Weeks, Mark Miller, Co-Founders, All Day DevOps | |
Jaya Baloo, Chief Information Security Officer, KPN Telecom | |
- This one was mostly about quantum computing | |
- The DevSecOps tie-in was mostly at the end and had to do with | |
anticipating upcoming changes in computing and putting in place defensive | |
security infrastructure and thinking about secrecy of cipher texts that | |
have already been collected. That's a cross-functional concern, so it | |
includes Dev, Ops, and Sec | |
https://www.youtube.com/watch?v=MnevoY_ACD4 | |
https://www.youtube.com/watch?v=-JosVWcYUsI | |
DevSecOps and the DevOps Superpattern - Helen Beal | |
- Beal @ Ranger4 - a DevOps consulting company | |
- History of DevOps | |
- Patrick DuBois @ Google | |
- Paul Hammond & John Osborne @ Flickr | |
- Getting Ops / Management more Agile | |
- Newer definition: value stream development | |
- CAMS an acronym to describe what DevOps is | |
Culture | |
Automation | |
Measurement | |
Sharing | |
- "Is DevSecOps a Good Thing?" talk (answer: "yes") | |
- slide: "Is security an afterthought?" (answer: "also yes") | |
- Parts Unlimited Team - simulated software development org | |
Based on "The Phoenix Project" book by Gene Kim, Kevin Behr, and George | |
Spafford | |
(https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/0988262509?SubscriptionId=AKIAILSHYYTFIVPWUY6Q&tag=duckduckgo-ffsb-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=0988262509) | |
- Presents some anti-patterns in communications with security team | |
ME: seems pretty heavy on things that sec needs to do to 'catch up' vs | |
things other groups could do to work better with dev (e.g,, designing | |
pipelines in such a way that sec is empowered to not just perform scans, | |
but also make and deploy patches) | |
ME: Should point out that the slide is in the same language as the Agile | |
Manifesto values statement (e.g., "Working software over comprehensive | |
documentation"), | |
- Everyone has responsibility for security | |
- The DevOps "Super-pattern" (slide 12): https://youtu.be/MnevoY_ACD4?t=701 | |
- DevOps comes from the concepts | |
- Holacracy (portmanteau of "holistic" and "democracy") | |
- ITSM | |
- Agile | |
- Lean | |
- Three-legged stool (theory of constraints) | |
- Learning organization | |
- Safety culture | |
- Harmonious Polygamist Marriage | |
- Table looking at some of the concepts above through CAMS "lenses" | |
- Agile Daily collab *including with sec* | |
- Holacracy - A flat organizational structure | |
- Don't shoot, but reward the messenger | |
- Don't isolate sec | |
- Agile Service Management (ASM) | |
- ITIL (IT infrastructure library) processes and procedures | |
- Delivering IT through Agile methods | |
- Just enough governance to deliver value | |
- Sharing vocabularies and intelligence | |
- Lean | |
- Automation can help resolve the security skills gap | |
ME: This slide could be greatly condensed, at least to the points which | |
are highlighted as Beal moves through it. Splitting out into CAMS | |
aspects doesn't help me to understand how security can work with other | |
groups. A general description of the disciplines would suffice for an | |
orientation which could then be coupled with the non-obvious CAMS | |
alignment would suffice for laying out the 'superpattern' | |
https://www.youtube.com/watch?v=2812-6gdbyA | |
Understand Immutable Infrastructure: What? Why? How? - Quentin Adam | |
- This one was hard to follow at times because of the presentation style, | |
but it seems that the Adam's company looked at the problems that | |
typically cause failures in real architectures and determined that the | |
biggest ones come from human interaction with the system, and in | |
particular, reconfiguration at runtime. | |
- At Clever Cloud, apparently, an SSH login to a server is a 'red alert' | |
- Their methodolgy involves deploying new instances to change configuration | |
rather than changing the configuration on a running server. According to | |
Adam, this means that the state space for a server is reduced from N | |
variables (e.g., the space of versions of all peices of software on a | |
system) to 2 (working / not working). I suppose that using this | |
methodology would reduce the incidence of 'fat-fingering' a command so | |
that you accidentally take a whole cluster offline. | |
https://www.youtube.com/watch?v=6Rf5ChCSoBs | |
But We Can't Do That Here! - Liz Keogh | |
- I liked this talk a lot. | |
- Main things I took from it: | |
- Cynefin : a framework for thinking about domains of problem-solving / | |
decision making | |
- Chaotic (act, sense, respond) | |
- Complex (probe, sense, respond) | |
- Complicated (sense, analyze, respond) | |
- Obvious (sense, categorize, respond) | |
Knowing which domain you're in is key to not making expensive mistakes. | |
- Utility of value-stream mapping, even for teams whose primary | |
'customer' is within the organization | |
- Related one story about how infrastructure team identified 50 | |
steps to acquiring a server, then found the points where the | |
process was disrupted, then focused on those steps. | |
https://www.youtube.com/watch?v=IAcUalc5_d0 | |
Increasing the Dependability of DevOps Processes - Ingo Weber | |
- Describes a framework for alerting on anomalous occurences in a cloud | |
deployment environment | |
- Not gov't-focused, but Weber is part of an org that is "technically" | |
governmental | |
- Like Quentin Adam's presentation, indicates that most failures are the | |
result of human error, but actually cites a study to back that up. | |
- Framework involves creating models for various actions performed on the | |
system. Main example is a rolling deployment of a new release. | |
- Based on log events and polling of cloud APIs | |
- Overall approach is called POD or Process-Oriented Dependability | |
- Alerts based on unexpected state transitions indicated by cloud API | |
results and observed log messages (e.g., restarting some group of servers | |
and not all servers in the group have a 'shutdown' and a 'startup' | |
message before a 'complete' message for the restart command) | |
- Offline training for the models ME: Why not online? In general, the | |
behavior observed in production can be different in terms of response | |
times and nominal log volumes. Although the log events described are | |
probably conserved between environments, Weber leaves out a whole class | |
of other events with only offline training. | |
- Describes some timing-dependent alerts. ME: The example shows what look | |
like 4 different modes, which should be broken up into distict models, | |
but still maybe a good approach. | |
- Near the end, also discusses how corrective actions could be triggered on | |
alerts. | |
- ME: It doesn't seem like this approach would serve well for "unknown | |
unknowns" or undesirable emergent behavior in a system that, although | |
it's undesired, still fits the trained models. Not to mention, the burden | |
of creating log events for a bunch of things before you even realize that | |
they're predictive of unexpected failures. | |
Building Technical and Organizational Confidence Through Automated Deployments - Mieke Deenen | |
- A very cool "lessons-learned" talk about getting an organization on board | |
with a DevOps cultural shift | |
- About the social security collection and benefits site for the | |
Netherlands, werk.nl | |
- Started relatively small with one success, then with gained confidence | |
moved onto customer service division | |
ME: Not explicitly mentioned, but custsvc seems like an excellent place | |
to start from a value-stream perspective considering this is a gov't org | |
- Automated deployments were used for flexibility rather than just speed | |
ME: This is something I've often thought about: not just "we can deploy | |
faster", but now that we don't have to wait long for deployments "what | |
freedom does a quick deployment get us to experiment with things" | |
https://www.youtube.com/watch?v=Ulp91L2zXPE | |
How We Went From 40 Days to 3 Building Crystal Clear Test Cases While Improving Test Coverage! - Stephen Tyler | |
- A model-based testing workflow / toolchain talk | |
- Characterized as experimental | |
- Primary concern is reducing defects that 'escape' to prod | |
- A couple of anecdotes and a few case studies, but no comprehensive data | |
showing reduction in defect escape across a variety of programs | |
Lessons in Leading a Fortune 100 Team to a DevOps Philosophy - Uldis Karlovs-Karlovskis | |
- From "Nordis DevOps Lead" at Accenture. Accenture Latvia. | |
- About managerial / communication structures | |
- Mostly, didn't seem very reproducible, but suggested take-aways: | |
1. Assume people are trying to do the right thing | |
2. Seek intrinsic rewards rather than extrinsic rewards (e.g., cash rewards) | |
3. Engagement is an employee responsibility | |
4. Let people lead (in their own way) | |
- ME: Not that interesting for a software engineer... | |
https://www.youtube.com/watch?v=SRXohzWQkp0 | |
Escrow: How To Share Secrets - Kyle Rickman | |
- Underarmor: Connected Fitness | |
- Rickman is an software development infrastrucure engineer for internal | |
teams | |
- Backstory for talk is Underarmor acquired 3 different companies with | |
different products and toolchains who needed to share key-value data | |
between the groups of developers from those companies. | |
- Wants to permit sharing and collaboration without revealing data between | |
the groups that shouldn't be shared and without requiring interaction | |
with the infrastructure engineer in order to share the data. | |
- Developed a tool called Escrow for that purpose | |
- Escrow has hierarchical key-value lists called "Chains" and each level in | |
the hierarchy is called a link | |
- Emphasizes "API-first" design, or having web API that devs can use to | |
build integrations | |
- A group or a user owns a link and can make the link Private or Public | |
- "Escrow" addresses integrity through "Rendering" chains, which holds the | |
values in the chain constant at whatever point in time the rendering is | |
made. The rendering is called an "Artifact" | |
- ME: The Artifact concept, doesn't fully address integrity since a group | |
can, apparently, write arbitrary keys and values, potentially overwriting | |
higher-integrity values farther up in the chain. Perhaps the Chain | |
construction process is expected to identify such cases, but it seems | |
like there's a real limitation there. | |
A DevOps State of Mind: Continuous Security with DevSecOps + Containers - Chris Van Tuin | |
- Last name pronounced "Van-Tie" | |
- RedHat strategist | |
- Disuptors | |
Empowered organization | |
Rapid Innovation | |
Data-Driven Intelligience | |
Culture of Experimentation (enabled by IT automation) | |
- Suggests that containers + cloud enable easier dev/sec integration | |
- Describes a pipeline for building out to containers and moving to | |
production through promotion steps | |
- For security fixes, the model is to fix in the container and deploy the | |
container...obviously doesn't address security issues in the container | |
runtime (e.g., docker) itself or in the container orchestration platform, | |
but it's a more reliable approach from having humans patching servers or | |
even of having a script running against live containers. It's actually | |
the same idea as the Clever Cloud CEO Quentin Adam's 'immutable | |
infrastructure'. | |
- ME: The mention of a company-wide docker registry really appeals to me. | |
We currently have a docker registry just for our program that houses | |
public images as well as our program-specific images. There are security | |
issues that come up (as suggested in the talk) when vulns are discovered | |
which are still sitting in publicly accesible containers. We should be | |
able to make a registry within the co. for, at least, public images. | |
- ME: Tesla manufacturing example...probably not the best example | |
considering their current 'production hell' | |
https://www.youtube.com/watch?v=ApVI7-g_wpk | |
https://www.youtube.com/watch?v=OaojdXYSkpI | |
Secrets of a High Performance Security Focussed Agile Team - Kim Carter | |
- Sensible security model | |
- Bruce Snier. 5 steps | |
- Talks about his book chapters which is a confusing way to start...and | |
somewhat annoying. | |
- Talks about 'code monkey' vs 'professional dev' | |
- Describes ways to introduce security-centric activities into a sprint | |
ME: The presenter sounded ill. Coughs, sniffs and swallows were distracting. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment