MicroXchg 2017

## Resilient functional service design

The problem

you don't make any money until you go to production
you don't make any money unless your software is available & responsive
distributed systems change the rules of making it robust - throwing money at hardware is no longer enough
Failure is now the norm, it's unpredictable, it's going to get worse. Don't try to avoid failures, accept they'll happen

Design for resilience

If you don't get the core resilience right all the monitoring & recovery won't save you
Systems should fail in isolation and not cascade
- However sometimes services depend on each other from a business point of view
- It doesn't matter how many circuit breakers you have, once you have a dep chain like this you're screwed
- By trying to avoid this you accidentally build a monolith
We learn func decomposition, DRY, layered architecture, design for reusability
- But this leads to tightly coupled, low cohesion, non-resilient services...
Caches to the rescue?
- "Do you really think that copying stale data all over your system is a suitable measure to fix inherently broken design?"
- Important, but not a replacement for good design

Re-learning systems design for distributed systems

Bulkhead design
Of course there's no silver bullet
Think about the foundation of design
- High cohesion, low coupling
- Separation of concerns
Resources
- 1972 paper: decomposing systems into modules
- Lean book
Do DDD (as opposed to EDD (entity) - ffs ubiquitous language)
- DO NOT start with the data model
- You'll find the separation of concerns in the business model, in the dynamic model
Short activation paths
- Minimise the amount of internal remote calls to satisfy one request
DISMISS REUSABILITY
- Reusability == coupling
- Leads to bad service design & compromises availability
- Rarely pays off, avg reusability factor of 1.1 or 1.2 - it needs to be 5 to be worth it
- "Do not strive for reuse, strive for replaceabilty"
- If a module should have been made reusable, it will become evident over time
Think about your communication paradigm
- Horizontal (sync) vs. vertical (async / event) slicing
- Influences overall service design a lot, and the resilience patterns to use
- Choose carefully, don't limit your design options without understanding the reasons behind and ramifications of your decision

DDD & REST - Domain-Driven APIs for the Web

If you accept a core domain object as a string, how do you know it's valid? e.g. string email vs email vo
Your persistence engine choice can change the way you think about your entities and aggregates
Ubiquitous language is contained between bounded contexts - e.g. order in PO context vs. logi context

### Domain events

Level 0 - CRUD
Level 1 - Explicit operations
- In terms of business operations (UL)
Level 2 - some operations are events
Level 3 - CQRS & ES - event all the things (out of scope of this talk)
Prevents feature creep - events help decouple - avoid integration issues
- Move event creation to the aggregate - e.g. Order::complete
Treat events as an explicit concept - same as you're explicit with your types & VOs

REST

REST is NOT CRUD over HTTP
DDD modelling can help a lot here
- Aggregates - 3 important characteristics - identifiable, referable, scope of consistency
  - Same as REST resources!
  - Look at where your aggregates are, and shape your resources around your aggregate boundaries
  - Representation design matters, you should always take the aggregate into context
    - In REST you can represent this with hypermedia / HATEOAS
      - You can represent for example status with this - e.g. you can cancel an order when a "cancel" link is available, rather than based on status id
      - Reducing the complexity of business decisions that a client has to make
      - Reducing domain knowledge in the client
  - This helps with API evolvability
    - Key in a system of systems - allows deployment without forcing other systems to update as well - Blog post

Microservice Websites

Problem

How to develop a website with multiple teams?
Different business units making a website that feels like one contiguous experience
Frontend as a bottleneck
"Decentralised Governance" gives an option for teams to choose different tools (book by fowler)
Mobile perf (the thing you were thinking of wrting about)

Transclusion

Including all or part of an electronic document into one or more other documents by hypertext reference
Expose a fragment resource, /shopping-cart, consume declaratively like <img src="">
See: Edge side includes <esi:include src=""> - server side w3c rec
- Requires transpilation, supported by Akamai, fastly etc
- Allows you to cache the shit out of most of your page, and just reload dynamic elements e.g notification
See: <h-include src=""> - client side library with custom elements, transitive includes, http2, lazy loading!
- Async but has XHR lag
- Vanilla JS and polyfills only
Can use both together to use best tradeoffs
now you have service dependencies - fragment is dependent on its own CSS/JS
Need cache busting
Service side transclusion works well here
Dude wrote a thing

Microservices and mobile

Fast iterations for everyone (even on mobile)
As an industry we need to push to reject manual review (in app stores)
- This is "mobile waterfall"
Couldn't release a feature that was only for Berlin, because reviewers were in US
Bugfix? Half a week >:|
Canary releases? Forget it
Extensive E2E tests or manual QA mitigate bugs, but aren't really a solution, and just make the process slower
Beta users (e.g. testflight) are good, but don't let you deploy 10 versions for canary testing
web vs native all over again
PWAs solve a lot of the cons of web apps :party:
Hybrid applications have the cons of both (not counting react native), but slightly quicker deployment cycles as you can update the embedded webapp. Also perf sucks

Roll your own solution

Crazy, but let's do it anyway
Building your mobile app as a parser for your microservice's data
Build it as configuration - when this event is received, perform these actions
- Ask the server what should happen each time - application logic as a service!
- This is fucking cool - basically your own DSL for your app
- You can choose your trade-offs:
  - Performance vs. complexity
  - Quicker iterations
  - Cost efficiency

#### Pseudo-microservice for mobile

When your application starts, get a list of services from the backend
Register event handlers
When the event occurs, query your service and perform the resulting actions
This way you can even canary test multiple business logic workflows

https://github.com/waterlink/LikeyLikes https://github.com/waterlink/LikeyLikesStatic https://git.io/vDJKG

LT: Too many microservices

MSs are good
- Iterate quickly
- Smaller is simpler
- Better defined responsibilities & teams
They mean
- More services
- More teams
- More communication
- Need for more documentation
Problems
- What is available on the platform?
- Platform architecture?
- Who is responsible for a service?
Wikis are shit
You could try to collect metadata from your microservices
- API discovery via crawling swagger / openAPI
- https://github.com/zalando-incubator/api-discovery
- Added a dropdown list to swagger UI to be able to browse docs for all APIs
Before:
- YAML files kept up to date
- Wiki generated from these files
- But search was limited, no immedate benenefit
Then came up with pivio.io
- Who owns what
- Service registry
- Fancy diagrams
- Searchable (elasticsearch)
- Built-in query language

The pretty face of your microservice

Consider your API as a UI with developers as the users
- Or machines! :robot_face:

Start with a really good idea
- A fancy API won't rescue a useless microservice
- Don't be afraid to throw stuff out
Match your system ito the real world
- Ubiquitous language and DDD
Don't reinvent the wheel
- Follow standards
- Share patterns, traits and schemas - so paging, filtering etc. is always done the same way
Internal consistency
- The same action should work the same way
- Pattern library
- Consistent error codes & messages (drink)
- Naming conventions (drink)
Prevent errors
- Make it very hard to make mistakes
- Validation rules should go in API definition
- Be tolerant when reading input
Minimalist design
- Don't put more stuff in your API than you need
- Ask for the bare minimum, and avoid redundancy
- Use references instead of the full dataset
Help and documentation
- Ensure reality matches docs
- Make it easy for people to read (good structure & search, up to date)
- Explain how to recover from errors
Break the rules
- When standards and usability disagree, follow usability
- Remain consistent
- Know why you are breaking the rules, master the rules first
Don't justify your design
- If your users don't like it, change it

Applying runtime configuration to a microservice architecture

smartlaw.de
White label config (different tenant, same environent, so different config on per-request basis)
This one is boring as fuck and the guy is in a suit
But at least he has a tag cloud

Microservices: the organisational & people impact

Most of the problems with microservices are connected with people and organisational systems

Strategy: situational awarenes
- Are microservices a good fit?
  - Middle management are latching onto microservices as a buzzword to brand things
  - Lipstick on the pig
- Not understanding architecure principles
  - Build around business functionality (DDD)
  - Creating mini monoliths (12 factor!)
- No well defined devops
  - Deployment / ops free for all
- Microservices are not a silver bullet if you have these problems
  - Determine business goals, hypothesise, choose tech, and validate
- So what are our goals?
  - Delivering value
  - Business agility
  - Safer, more rapid changes to software
  - Why jump to microservices? CI/CD/DevOps, value stream should come first. Important foundation
- Wardley maps: useful technique to understand business and technical landscapes
- You have to know where you are before you can decide where to go
- Choose tooling to support your approach - don't change your approach because of your choice of tooling
Define goals
- SMART goals
  - Get your stakeholders involved
- Communicate the vision
  - Map strategic goals to architecture goals
  - Map these back to development and practice goals
- DO UBIQUITOUS LANGUAGE RIGHT ARGH
Technical leadership is vital
- Promote shared understanding - it's about communication
- Do proper risk management
- "Just enough" up-front design - how much architecture is just enough?
- Conway's law is well accepted, but it's not so clear where architects sit
  - Overarching, consulting, or per team?
  - PO having final say is bad - three amigos pattern
- Technical insanity antipattern
  - We created a tech mess
  - But no change is required???
  - We need strong technical lead
  - If you can't create a well-architected monolith what makes you think you can make microservices???!
- InnerSource
  - Encourages sharing and documentation
  - Reduces tribal knowledge
  - Draw a map, rather than copy the territory
  - Quality guidelines come baked into testing
Evaluating tooling
- Spine model
- We get stuck at a dogmatic level with the tooling, the spine model helps you decide what you value rather than making knee jerk reactions
- Bias is super real
- https://www.amazon.de/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555
- Antipattern: Technical envy
  - Blindly copying e.g. Netflix, Spotify
  - Learn the context, principles, practices, culture
  - Understand the advantages and drawbacks
Feedback - visibility and constant learning
- Business, architecture, operations
- Business:
  - Dashboards and metrics are useful
  - Microservices should be business-driven
  - Validate hypotheses
  - Share metrics regularly
  - Show the benefit and business value the microservices bring
- Architecural feedback
  - Your code as a crime scene
  - Visualise churn and complexity
  - Work out where the "slums" of your code are
  - Same thing for code quality
- Antipattern: Trojan monoservice
  - When you accidentally make a monolith
  - Matthew Skelton: Types of software monolith
  - Continulally retrospect on technical work using supporting metrics
- Operational visibility
  - Logging, monitoring, alerting (drink!)
  - When bad things happen, people are always involved
  - Mikey Dickinson & healthcare.gov
  - A little standardisation goes a long way
    - Automation is the goal
    - Understand problems with postmortems, get at the root cause
    - Checklists provide structure
- When done well, microservices enable agility
  - But if you don't build in signals and metrics, and you don't have the data and adapt to it, then there's limited benefit
Responsibilities
- Just change to squads, chspters, guilds. Problem solved!
  - Learn from conway, netflix, spotify, but don't blindly cargo cult
- Devops
  - Antipattern: The fullstack myth
  - Define responsibilities, who owns what (gitlab: nobody owned backups)
  - Focus on what matters wrt microservices, you need to have devops nailed down first
  - Top: CI/CD - how much value is there in non-deployed code?
- Antipattern: Water-micro-fall
  - The "perfect" microservice
  - Not validating assumptions
  - Change mindset to continuously deliver incremental changes to production asap
  - Dancing skeleton: Get something super simple through a pipeline to production ASAP
- Change management is essential
- Transformation is a process, you can't buy devops
- Leading Change by John P Kotter

Day 2

## Authorization and Authentication in microservice environments

### Problem

Log in and see the UI, but the UI might be powered by different microservices
- How do these know what the user is allowed to do?
JWT can help
- Log into auth service, get JWT, UI sends JWT to microservices
- Microservices can check token validity themselves (signature) - they don't need the auth service anymore
- Two types, JWS / JWE
JWS
- Three parts: header, payload (claims), signature
- Header contains algorithm that was used
- Payload contains iss, exp, sub... you know this stuff
- Signed with secret (private key)
JWE
- Five parts:
  - Header
  - encrypted key
    - symmetric
    - encrypted with shared secret
  - initialisation vector (salt)
  - cipher text
    - encrypted payload
    - encrypted with enc algorithm
    - encrypted using initialization vector
  - auth tag
    - also a result of enc algo
    - ensures integrity
- Two additional keys in payload:
  - enc: Encryption algorithm for the cipher text
  - zip: compression algorithm
- Pro: Everything is unreadable to the user
- Con: Have to distribute private keys to microservices

## Secure Microservices Adoption

By isolating services, we isolate security risks
Not every service is equally important, some are higher value targets
Isolated services reducce overall security risks
Problem: End users need to interact with multiple services
Problem: End users frequently include many roles
Solution 1: API gateway
Solution 2: Backend for frontend for different end users
Trust boundary: at this point you need Authentication, Authorization, validation
- Different use cases fit different authN solutions
- internal boundaries
  - Maybe your service shouldn't access other, more sensitive services (privilege escalation)
Is this client allowed to access this entity?
- Customer can't modify order for another customer

Secret management

Secret management software should:
- Store & transfer encrypted
- Audit all access
- Rotate automatically
- Fine grained ACL

Beyond OAuth2: E2E Microservice Security

More teams
- Smaller (2 pizza)
- More independent
More trust boundaries
Speed to production
Mo' processes mo' problems :meth-mo:
OAuth2 to the rescue
One token to rule them all
- Danger!
- The token is too powerful - can do anything to the system as that user
- Only limit is expiry
- Token leakage is a big deal
Client credentials grant type can be used internally
- Don't pass user jwt around, resource can get its own token representation
- Match tokens to your orgs trust boundaries
  - Teams maybe don't fully trust each other, apps perhaps shouldn't either
Proposal for new oauth2 grant type: token exchange
- Given actor + subject + audience, get a new token
  - Policy decision given caller, user and intent
  - New token expresses these
- Given actor + previous token + audience, get a new token
  - Policy decision based on delegation chain (call stack)
- Now we can take internal trust boundaries into account
- Pros:
  - User, client, call stack are part of policy decision
  - can request very limited power tokens (audience & scope)
  - Trust boundaries are unambiguous, all information is present to the auth server
  - Centralised policy management
- Cons:
  - Network and auth server overhead
  - Security vs. performance tradeoff
    - Token caching & reuse
  - Policy management vs. agility
Dude made a thing, it's all java
- JWT-ception
- Single use JWT
  - 1 aud, 1 op
- Embed JWT inside JWT for nested microservice calls

Understand, Automate, Collaborate for Development Speed with Microservices

The challenge of a postmodern software developer
Engineer of your own problems
I'd like to be writing some code, but I have to do other stuff
- Navigating VCS
- Clicking things in the browser
- Jumping back and forth a LOT
- CI and CD
- DevOops
With microservices this is x100
Alt+Tab should not be a key skill!
We have great tools out there
- But they're not really aware of each other
- They also can't read your mind
- You're the one who has to map that back to your context
Modern software development is a cognitive overhead problem
How many microservices do you need to comprehend to do your work?
This is the reason for SRP - reduce cognitive overhead!
but we keep creating the problem for ourselves
So:
- What is of use?
- What should I notice and what should I ignore?
- Where do I need to go to get it?
Bring in information at the right time right to your eyeballs, as and when you need it
- Shouldn't have to go and get it
- No more "have you seen X" where X is a critical that's lost in the noise
- Actionable information at all times
Microservice systems become big data problems in their logging alone
- What's the bare minimum I need to see?
- What extra information can be used to enrich that?
We've been creating highly complex systems that seem to hate human beings for a long time
How to solve these problems?
- Chat? (slack)
  - No wait, far too much chat
  - Noise amplification system
  - Too many integrations, too much giphy
  - Jet cockpit with all the dials
  - Please please no more slack
- It has to be more than just show
  - Chat can end up a nightmare, but with the right thinking you can turn it into much more
    - It's a lousy dashboard, but a great way to collaborate
    - Get several people aware of something and working on it
    - Let's not turn slack into a shitty dashboard, it's not about pushing information in there. It's about doing stuff.
  - Observe
    - Make me aware
    - Notify me just when I need to know
  - Orient
    - Show me what I need to know
    - Supporting information
    - Who can I talk to? Who made the last commit?
    - Bring them in
  - Decide
    - Help me figure it out
    - Where should I look?
  - Act
    - Help me act
  - Chat is a great system for enabling an OODA loop for your system
    - Forget all the hype, use it for collaboration - what it's designed for
Visibility and Control to Automate Software Development
- Less busywork
- A key skill in microservices is creating a new project
  - Get it in CI
  - Get it deployed
  - etc.
  - Make this as easy as possible
  - Often you walk into a project that's already there, making something new is a key skill and takes much longer than you think
Atomist - make sense of your software development flow
- Tighten your OODA loop inside a collaborative environment (slack)
- Integrate all the things so you can get the right information to the right eyeballs and the right actions to get stuck in
- Less yak shaving
Normal chat: STUFF IS GOING ON
- @atomist: list issues
  - Takes deluge of info and helps you make decisions
- @atomist: create issue
  - Uses slack thread (neat idea)
  - Gives you buttons for actions
- Collaborative
- E.g. travis build.... then release button appears
- Rug files (dsl) for configuration or typescript
- Also has editors for rewriting / linting code
  - Done through slack UI again :D

Conway's Law and the Innovator's Dilemma

If the building blocks are already there it's easier to glue them together
Pedro got sherlocked
Divide and conquer
Independent, stable teams building independent services
- If you want to go fast, go alone
Think global, act local
- Global vision to align
- Shared values to set focus
- Local decisions - microservices, not micromanagement
Decentralised planning
Microteams, microservices, microwins
- "billing" other teams for their services
- "Cost" and service on a team level rather than org, so it's proportional
Planning, prioritising, and saying no
Made a proto-persona
Put him on the map from "crossing the chasm" bell curve
say no to those too far over the chasm, you can't develop everything at the same time and one sector is more important
Build-measure-learn-add more customers
Serendipity strikes again when reading the innovator's dilemma
- Divide and conquer
- Be friends with your fans
- Deliberate ignorance
- Optimised processes
- Small teams big wins
5 challenges
- Companies depend on customers and investors for resources
- Markets that don't exist can't be analysed
- Technology supply may not equal market demand (minidisc)
- An organisation's capabilities define its disabilities
- Small markets don't solve the growth needs of large companies (but divide and conquer, feel more important)

With great power comes great responsibility

Distributed Scheduler Hell

How we moved 100s of VMs into containers
How we deploy a distributed database into production (digitalocean/vulcan, port of Prometheus)
Requirements:
- 3Gbps traffic
- 20TB storage
- <100ms read times
- 100k write ops
Prior: Everything was a VM
Arduino: 1 process -> kernel (processes etc)
Distributed scheduler provides:
- Container deploy
- v memory deploy
- memory quota
- Disk storage
- Networking
- CPU scheduling
E.g. mesos
Deployed everything to mesos, no more devops... everything broke
Distributed: different applications will run onto different nodes and you don't care
Kafka as a custom scheduler on top of marathon on top of mesos on top of linux
- dafuq
Mesos killed kafka master and deleted all the data :D
Deleted mesos :D
Found hashicorp nomad instead
- has its own gossip protocol so that there's no master to be down
- If one datacenter goes off it still works
- However nobody uses it and they don't handle state
kubernetes instead
- Can hide complexity
- Kubelets make things more resilient than mesos
- Kube yml is verbose
- Really good CLI
Upsides to distributed schedulers:
- No more devops, just throw your containers up there and be done

Alles hat ein Ende, nur die Wurst hat zwei

kieranajp/microXchg.md

MicroXchg 2017

The problem

Design for resilience

Re-learning systems design for distributed systems

DDD & REST - Domain-Driven APIs for the Web

REST

Microservice Websites

Problem

Transclusion

Microservices and mobile

Roll your own solution

LT: Too many microservices

The pretty face of your microservice

Applying runtime configuration to a microservice architecture

Microservices: the organisational & people impact

Day 2

Secret management

Beyond OAuth2: E2E Microservice Security

Understand, Automate, Collaborate for Development Speed with Microservices

Conway's Law and the Innovator's Dilemma

Distributed Scheduler Hell