## Resilient functional service design
- you don't make any money until you go to production
- you don't make any money unless your software is available & responsive
- distributed systems change the rules of making it robust - throwing money at hardware is no longer enough
- Failure is now the norm, it's unpredictable, it's going to get worse. Don't try to avoid failures, accept they'll happen
- If you don't get the core resilience right all the monitoring & recovery won't save you
- Systems should fail in isolation and not cascade
- However sometimes services depend on each other from a business point of view
- It doesn't matter how many circuit breakers you have, once you have a dep chain like this you're screwed
- By trying to avoid this you accidentally build a monolith
- We learn func decomposition, DRY, layered architecture, design for reusability
- But this leads to tightly coupled, low cohesion, non-resilient services...
- Caches to the rescue?
- "Do you really think that copying stale data all over your system is a suitable measure to fix inherently broken design?"
- Important, but not a replacement for good design
- Bulkhead design
- Of course there's no silver bullet
- Think about the foundation of design
- High cohesion, low coupling
- Separation of concerns
- Resources
- 1972 paper: decomposing systems into modules
- Lean book
- Do DDD (as opposed to EDD (entity) - ffs ubiquitous language)
- DO NOT start with the data model
- You'll find the separation of concerns in the business model, in the dynamic model
- Short activation paths
- Minimise the amount of internal remote calls to satisfy one request
- DISMISS REUSABILITY
- Reusability == coupling
- Leads to bad service design & compromises availability
- Rarely pays off, avg reusability factor of 1.1 or 1.2 - it needs to be 5 to be worth it
- "Do not strive for reuse, strive for replaceabilty"
- If a module should have been made reusable, it will become evident over time
- Think about your communication paradigm
- Horizontal (sync) vs. vertical (async / event) slicing
- Influences overall service design a lot, and the resilience patterns to use
- Choose carefully, don't limit your design options without understanding the reasons behind and ramifications of your decision
- If you accept a core domain object as a string, how do you know it's valid? e.g. string email vs email vo
- Your persistence engine choice can change the way you think about your entities and aggregates
- Ubiquitous language is contained between bounded contexts - e.g. order in PO context vs. logi context
### Domain events
-
Level 0 - CRUD
-
Level 1 - Explicit operations
- In terms of business operations (UL)
-
Level 2 - some operations are events
-
Level 3 - CQRS & ES - event all the things (out of scope of this talk)
-
Prevents feature creep - events help decouple - avoid integration issues
- Move event creation to the aggregate - e.g. Order::complete
-
Treat events as an explicit concept - same as you're explicit with your types & VOs
- REST is NOT CRUD over HTTP
- DDD modelling can help a lot here
- Aggregates - 3 important characteristics - identifiable, referable, scope of consistency
- Same as REST resources!
- Look at where your aggregates are, and shape your resources around your aggregate boundaries
- Representation design matters, you should always take the aggregate into context
- In REST you can represent this with hypermedia / HATEOAS
- You can represent for example status with this - e.g. you can cancel an order when a "cancel" link is available, rather than based on status id
- Reducing the complexity of business decisions that a client has to make
- Reducing domain knowledge in the client
- In REST you can represent this with hypermedia / HATEOAS
- This helps with API evolvability
- Key in a system of systems - allows deployment without forcing other systems to update as well - Blog post
- Aggregates - 3 important characteristics - identifiable, referable, scope of consistency
- How to develop a website with multiple teams?
- Different business units making a website that feels like one contiguous experience
- Frontend as a bottleneck
- "Decentralised Governance" gives an option for teams to choose different tools (book by fowler)
- Mobile perf (the thing you were thinking of wrting about)
-
Including all or part of an electronic document into one or more other documents by hypertext reference
-
Expose a fragment resource,
/shopping-cart
, consume declaratively like<img src="">
-
See: Edge side includes
<esi:include src="">
- server side w3c rec- Requires transpilation, supported by Akamai, fastly etc
- Allows you to cache the shit out of most of your page, and just reload dynamic elements e.g notification
-
See:
<h-include src="">
- client side library with custom elements, transitive includes, http2, lazy loading!- Async but has XHR lag
- Vanilla JS and polyfills only
-
Can use both together to use best tradeoffs
-
now you have service dependencies - fragment is dependent on its own CSS/JS
-
Need cache busting
-
Service side transclusion works well here
-
Dude wrote a thing
- Fast iterations for everyone (even on mobile)
- As an industry we need to push to reject manual review (in app stores)
- This is "mobile waterfall"
- Couldn't release a feature that was only for Berlin, because reviewers were in US
- Bugfix? Half a week >:|
- Canary releases? Forget it
- Extensive E2E tests or manual QA mitigate bugs, but aren't really a solution, and just make the process slower
- Beta users (e.g. testflight) are good, but don't let you deploy 10 versions for canary testing
- web vs native all over again
- PWAs solve a lot of the cons of web apps :party:
- Hybrid applications have the cons of both (not counting react native), but slightly quicker deployment cycles as you can update the embedded webapp. Also perf sucks
- Crazy, but let's do it anyway
- Building your mobile app as a parser for your microservice's data
- Build it as configuration - when this event is received, perform these actions
- Ask the server what should happen each time - application logic as a service!
- This is fucking cool - basically your own DSL for your app
- You can choose your trade-offs:
- Performance vs. complexity
- Quicker iterations
- Cost efficiency
#### Pseudo-microservice for mobile
- When your application starts, get a list of services from the backend
- Register event handlers
- When the event occurs, query your service and perform the resulting actions
- This way you can even canary test multiple business logic workflows
https://github.com/waterlink/LikeyLikes https://github.com/waterlink/LikeyLikesStatic https://git.io/vDJKG
- MSs are good
- Iterate quickly
- Smaller is simpler
- Better defined responsibilities & teams
- They mean
- More services
- More teams
- More communication
- Need for more documentation
- Problems
- What is available on the platform?
- Platform architecture?
- Who is responsible for a service?
- Wikis are shit
- You could try to collect metadata from your microservices
- API discovery via crawling swagger / openAPI
- https://github.com/zalando-incubator/api-discovery
- Added a dropdown list to swagger UI to be able to browse docs for all APIs
- Before:
- YAML files kept up to date
- Wiki generated from these files
- But search was limited, no immedate benenefit
- Then came up with pivio.io
- Who owns what
- Service registry
- Fancy diagrams
- Searchable (elasticsearch)
- Built-in query language
- Consider your API as a UI with developers as the users
- Or machines! :robot_face:
- Start with a really good idea
- A fancy API won't rescue a useless microservice
- Don't be afraid to throw stuff out
- Match your system ito the real world
- Ubiquitous language and DDD
- Don't reinvent the wheel
- Follow standards
- Share patterns, traits and schemas - so paging, filtering etc. is always done the same way
- Internal consistency
- The same action should work the same way
- Pattern library
- Consistent error codes & messages (drink)
- Naming conventions (drink)
- Prevent errors
- Make it very hard to make mistakes
- Validation rules should go in API definition
- Be tolerant when reading input
- Minimalist design
- Don't put more stuff in your API than you need
- Ask for the bare minimum, and avoid redundancy
- Use references instead of the full dataset
- Help and documentation
- Ensure reality matches docs
- Make it easy for people to read (good structure & search, up to date)
- Explain how to recover from errors
- Break the rules
- When standards and usability disagree, follow usability
- Remain consistent
- Know why you are breaking the rules, master the rules first
- Don't justify your design
- If your users don't like it, change it
- smartlaw.de
- White label config (different tenant, same environent, so different config on per-request basis)
- This one is boring as fuck and the guy is in a suit
- But at least he has a tag cloud
- Most of the problems with microservices are connected with people and organisational systems
- Strategy: situational awarenes
- Are microservices a good fit?
- Middle management are latching onto microservices as a buzzword to brand things
- Lipstick on the pig
- Not understanding architecure principles
- Build around business functionality (DDD)
- Creating mini monoliths (12 factor!)
- No well defined devops
- Deployment / ops free for all
- Microservices are not a silver bullet if you have these problems
- Determine business goals, hypothesise, choose tech, and validate
- So what are our goals?
- Delivering value
- Business agility
- Safer, more rapid changes to software
- Why jump to microservices? CI/CD/DevOps, value stream should come first. Important foundation
- Wardley maps: useful technique to understand business and technical landscapes
- You have to know where you are before you can decide where to go
- Choose tooling to support your approach - don't change your approach because of your choice of tooling
- Are microservices a good fit?
- Define goals
- SMART goals
- Get your stakeholders involved
- Communicate the vision
- Map strategic goals to architecture goals
- Map these back to development and practice goals
- DO UBIQUITOUS LANGUAGE RIGHT ARGH
- SMART goals
- Technical leadership is vital
- Promote shared understanding - it's about communication
- Do proper risk management
- "Just enough" up-front design - how much architecture is just enough?
- Conway's law is well accepted, but it's not so clear where architects sit
- Overarching, consulting, or per team?
- PO having final say is bad - three amigos pattern
- Technical insanity antipattern
- We created a tech mess
- But no change is required???
- We need strong technical lead
- If you can't create a well-architected monolith what makes you think you can make microservices???!
- InnerSource
- Encourages sharing and documentation
- Reduces tribal knowledge
- Draw a map, rather than copy the territory
- Quality guidelines come baked into testing
- Evaluating tooling
- Spine model
- We get stuck at a dogmatic level with the tooling, the spine model helps you decide what you value rather than making knee jerk reactions
- Bias is super real
- https://www.amazon.de/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555
- Antipattern: Technical envy
- Blindly copying e.g. Netflix, Spotify
- Learn the context, principles, practices, culture
- Understand the advantages and drawbacks
- Feedback - visibility and constant learning
- Business, architecture, operations
- Business:
- Dashboards and metrics are useful
- Microservices should be business-driven
- Validate hypotheses
- Share metrics regularly
- Show the benefit and business value the microservices bring
- Architecural feedback
- Your code as a crime scene
- Visualise churn and complexity
- Work out where the "slums" of your code are
- Same thing for code quality
- Antipattern: Trojan monoservice
- When you accidentally make a monolith
- Matthew Skelton: Types of software monolith
- Continulally retrospect on technical work using supporting metrics
- Operational visibility
- Logging, monitoring, alerting (drink!)
- When bad things happen, people are always involved
- Mikey Dickinson & healthcare.gov
- A little standardisation goes a long way
- Automation is the goal
- Understand problems with postmortems, get at the root cause
- Checklists provide structure
- When done well, microservices enable agility
- But if you don't build in signals and metrics, and you don't have the data and adapt to it, then there's limited benefit
- Responsibilities
- Just change to squads, chspters, guilds. Problem solved!
- Learn from conway, netflix, spotify, but don't blindly cargo cult
- Devops
- Antipattern: The fullstack myth
- Define responsibilities, who owns what (gitlab: nobody owned backups)
- Focus on what matters wrt microservices, you need to have devops nailed down first
- Top: CI/CD - how much value is there in non-deployed code?
- Antipattern: Water-micro-fall
- The "perfect" microservice
- Not validating assumptions
- Change mindset to continuously deliver incremental changes to production asap
- Dancing skeleton: Get something super simple through a pipeline to production ASAP
- Change management is essential
- Transformation is a process, you can't buy devops
- Leading Change by John P Kotter
- Just change to squads, chspters, guilds. Problem solved!
## Authorization and Authentication in microservice environments
### Problem
- Log in and see the UI, but the UI might be powered by different microservices
- How do these know what the user is allowed to do?
- JWT can help
- Log into auth service, get JWT, UI sends JWT to microservices
- Microservices can check token validity themselves (signature) - they don't need the auth service anymore
- Two types, JWS / JWE
- JWS
- Three parts: header, payload (claims), signature
- Header contains algorithm that was used
- Payload contains iss, exp, sub... you know this stuff
- Signed with secret (private key)
- JWE
- Five parts:
- Header
- encrypted key
- symmetric
- encrypted with shared secret
- initialisation vector (salt)
- cipher text
- encrypted payload
- encrypted with enc algorithm
- encrypted using initialization vector
- auth tag
- also a result of enc algo
- ensures integrity
- Two additional keys in payload:
- enc: Encryption algorithm for the cipher text
- zip: compression algorithm
- Pro: Everything is unreadable to the user
- Con: Have to distribute private keys to microservices
- Five parts:
## Secure Microservices Adoption
- By isolating services, we isolate security risks
- Not every service is equally important, some are higher value targets
- Isolated services reducce overall security risks
- Problem: End users need to interact with multiple services
- Problem: End users frequently include many roles
- Solution 1: API gateway
- Solution 2: Backend for frontend for different end users
- Trust boundary: at this point you need Authentication, Authorization, validation
- Different use cases fit different authN solutions
- internal boundaries
- Maybe your service shouldn't access other, more sensitive services (privilege escalation)
- Is this client allowed to access this entity?
- Customer can't modify order for another customer
- Secret management software should:
- Store & transfer encrypted
- Audit all access
- Rotate automatically
- Fine grained ACL
- More teams
- Smaller (2 pizza)
- More independent
- More trust boundaries
- Speed to production
- Mo' processes mo' problems :meth-mo:
- OAuth2 to the rescue
- One token to rule them all
- Danger!
- The token is too powerful - can do anything to the system as that user
- Only limit is expiry
- Token leakage is a big deal
- Client credentials grant type can be used internally
- Don't pass user jwt around, resource can get its own token representation
- Match tokens to your orgs trust boundaries
- Teams maybe don't fully trust each other, apps perhaps shouldn't either
- Proposal for new oauth2 grant type: token exchange
- Given actor + subject + audience, get a new token
- Policy decision given caller, user and intent
- New token expresses these
- Given actor + previous token + audience, get a new token
- Policy decision based on delegation chain (call stack)
- Now we can take internal trust boundaries into account
- Pros:
- User, client, call stack are part of policy decision
- can request very limited power tokens (audience & scope)
- Trust boundaries are unambiguous, all information is present to the auth server
- Centralised policy management
- Cons:
- Network and auth server overhead
- Security vs. performance tradeoff
- Token caching & reuse
- Policy management vs. agility
- Given actor + subject + audience, get a new token
- Dude made a thing, it's all java
- JWT-ception
- Single use JWT
- 1 aud, 1 op
- Embed JWT inside JWT for nested microservice calls
-
The challenge of a postmodern software developer
-
Engineer of your own problems
-
I'd like to be writing some code, but I have to do other stuff
- Navigating VCS
- Clicking things in the browser
- Jumping back and forth a LOT
- CI and CD
- DevOops
-
With microservices this is x100
-
Alt+Tab should not be a key skill!
-
We have great tools out there
- But they're not really aware of each other
- They also can't read your mind
- You're the one who has to map that back to your context
-
Modern software development is a cognitive overhead problem
-
How many microservices do you need to comprehend to do your work?
-
This is the reason for SRP - reduce cognitive overhead!
-
but we keep creating the problem for ourselves
-
So:
- What is of use?
- What should I notice and what should I ignore?
- Where do I need to go to get it?
-
Bring in information at the right time right to your eyeballs, as and when you need it
- Shouldn't have to go and get it
- No more "have you seen X" where X is a critical that's lost in the noise
- Actionable information at all times
-
Microservice systems become big data problems in their logging alone
- What's the bare minimum I need to see?
- What extra information can be used to enrich that?
-
We've been creating highly complex systems that seem to hate human beings for a long time
-
How to solve these problems?
- Chat? (slack)
- No wait, far too much chat
- Noise amplification system
- Too many integrations, too much giphy
- Jet cockpit with all the dials
- Please please no more slack
- It has to be more than just show
- Chat can end up a nightmare, but with the right thinking you can turn it into much more
- It's a lousy dashboard, but a great way to collaborate
- Get several people aware of something and working on it
- Let's not turn slack into a shitty dashboard, it's not about pushing information in there. It's about doing stuff.
- Observe
- Make me aware
- Notify me just when I need to know
- Orient
- Show me what I need to know
- Supporting information
- Who can I talk to? Who made the last commit?
- Bring them in
- Decide
- Help me figure it out
- Where should I look?
- Act
- Help me act
- Chat is a great system for enabling an OODA loop for your system
- Forget all the hype, use it for collaboration - what it's designed for
- Chat can end up a nightmare, but with the right thinking you can turn it into much more
- Chat? (slack)
-
Visibility and Control to Automate Software Development
- Less busywork
- A key skill in microservices is creating a new project
- Get it in CI
- Get it deployed
- etc.
- Make this as easy as possible
- Often you walk into a project that's already there, making something new is a key skill and takes much longer than you think
-
Atomist - make sense of your software development flow
- Tighten your OODA loop inside a collaborative environment (slack)
- Integrate all the things so you can get the right information to the right eyeballs and the right actions to get stuck in
- Less yak shaving
-
Normal chat: STUFF IS GOING ON
@atomist: list issues
- Takes deluge of info and helps you make decisions
@atomist: create issue
- Uses slack thread (neat idea)
- Gives you buttons for actions
- Collaborative
- E.g. travis build.... then release button appears
- Rug files (dsl) for configuration or typescript
- Also has editors for rewriting / linting code
- Done through slack UI again :D
-
If the building blocks are already there it's easier to glue them together
-
Pedro got sherlocked
-
Divide and conquer
-
Independent, stable teams building independent services
- If you want to go fast, go alone
-
Think global, act local
- Global vision to align
- Shared values to set focus
- Local decisions - microservices, not micromanagement
-
Decentralised planning
-
Microteams, microservices, microwins
- "billing" other teams for their services
- "Cost" and service on a team level rather than org, so it's proportional
-
Planning, prioritising, and saying no
-
Made a proto-persona
-
Put him on the map from "crossing the chasm" bell curve
-
say no to those too far over the chasm, you can't develop everything at the same time and one sector is more important
-
Build-measure-learn-add more customers
-
Serendipity strikes again when reading the innovator's dilemma
- Divide and conquer
- Be friends with your fans
- Deliberate ignorance
- Optimised processes
- Small teams big wins
-
5 challenges
- Companies depend on customers and investors for resources
- Markets that don't exist can't be analysed
- Technology supply may not equal market demand (minidisc)
- An organisation's capabilities define its disabilities
- Small markets don't solve the growth needs of large companies (but divide and conquer, feel more important)
With great power comes great responsibility
-
How we moved 100s of VMs into containers
-
How we deploy a distributed database into production (digitalocean/vulcan, port of Prometheus)
-
Requirements:
- 3Gbps traffic
- 20TB storage
- <100ms read times
- 100k write ops
-
Prior: Everything was a VM
-
Arduino: 1 process -> kernel (processes etc)
-
Distributed scheduler provides:
- Container deploy
- v memory deploy
- memory quota
- Disk storage
- Networking
- CPU scheduling
-
E.g. mesos
-
Deployed everything to mesos, no more devops... everything broke
-
Distributed: different applications will run onto different nodes and you don't care
-
Kafka as a custom scheduler on top of marathon on top of mesos on top of linux
- dafuq
-
Mesos killed kafka master and deleted all the data :D
-
Deleted mesos :D
-
Found hashicorp nomad instead
- has its own gossip protocol so that there's no master to be down
- If one datacenter goes off it still works
- However nobody uses it and they don't handle state
-
kubernetes instead
- Can hide complexity
- Kubelets make things more resilient than mesos
- Kube yml is verbose
- Really good CLI
-
Upsides to distributed schedulers:
- No more devops, just throw your containers up there and be done
Alles hat ein Ende, nur die Wurst hat zwei