Matt Stine
War stories on how he has gone about doing things over the past years.
Topics
- Rationale
- Evaluation categories
- Business case
- Resiliency
- Security
- Regulatory compliance
- Economics
- Scalability
- Provider "lock-in"
- Available tooling
- Undifferentiated heavy lifting
- Differentiating features
- How to create a scorecard based on some examples.
Cloud Native Landscape https://github.com/cncf/landscape
There are a lot of things!
What business problem are we trying to solve?
How does this type of service address this problem?
- e.g. relational v columnar v document v key:value databases
What are your resiliency requirements?
- Do NOT just say "It needs to be 'HA'"
- False dichotomy between HA and non-HA
Components of Resiliency
- "Available" means the system is functioning as designed
- "Consistent" you get the same response for the same request
- weak
- eventual
- "Partition tolerance" - network partition
- the system can function if internal communication is disrupted
- "Durability"
- security is NOT binary
- how does a client prove their identity?
- how are credentials provisioned/stored
- how are credentials delivered?
- how are credentials rotated? - avoid long living tokens (passwords? certificates?)
- What permissions types are supported?
- Are permissions grouped into roles?
- Are roles customizable?
- How are roles assigned to actors?
Most software running in the world is not very secure. They don't sleep very well :)
- "Residency" - where does the data live
- "Sovereignty" - who has control of the data what rules and laws apply to the data
- German's data cannot live outside of Germany
Encryption can make you more/less secure.
Compliance tells you how much encryption is needed to be legal.
- 'Data at rest' is NOT binary, encryption as it sits on the filesystem
- Who owns the data? Who can see it? Who as the keys to the data?
- 'Data in flight' can data be transferred over network
- What happened?
- When did it happen?
- What actor caused it?
- Where did it happen?
- Why did it happen?
- HIPPA
- SOX
Who is operating the service? If something goes wrong who's job is it to fix it?
What is your expected rate of consumption?
How is the service priced/costed? How is money spent to run the service?
Is the equation cost effective as a function of consumption and growth rate?
The system can maintain performance as the use ramps up.
Most software doesn't need to scale.
How is your load/volume expected to grow?
Is your load/volume consistent or bursty? Is it predictable?
Your always locked into something even if you build it yourself.
How easy can you change your architecture?
Is there a sensible way to leverage multiple providers? Often no because of expenses.
Is the service supported by open/defacto standards?
Is there a meaningful extraction layer? This helps us make changes more easily in the future.
Are you subject to "data gravity". How easy is it easy to move data around? You are likely to never move it out.
How good is the documentation?
How persistent is the documentation?
Does the service have a "well-designed" API? Do the abstractions map to yours?
Are client libraries available for your language(s) of choice?
Does your app framework of choice (Spring) support the service?
Is good management tooling available?
Is there a management API?
Is there automation tooling available for management?
Do you think the company is resilient. Will they suddenly go out of business? What happens in that case?
There are gaps between what the service provides and what you need it to do.
How will you close those gaps?
How much will it cost? How much will it cost to keep them closed?
Are there partners (consultants) you can utiltize to help close these gaps?
Is the provider trying toc lose the gaps you have? Will you fillings be fleeting?
There is a lot of parity out there!
Are they willing to sign my companies BAA?
What if the provider is also competitor in some fashion?
How well does this solution play with my other solutions?
When you pick the best of breed for each box do you really get the best overall solution?
Look at things in the aggregate!
- keep them simple
- simple ranges 1-3, 1-5
- Stay away from false dichotomies
- weight priorities
- callout subcategories when valuable
Often documentation seems to be and often is transient. When we have to support systems for years and extend the system with time how should/do you capture documentation for the versions you have now?