-
Multi Cloud Decisions
Key Enablers
- Workload Portability
- Ability to negotiate with suppliers
- Ability to select best tool for a given job
Keys
- Visibility - trusted single source of truth
- Efficiency - across dev, qa, security and operations
- Governance - automated security, code quality, vulnerability management, policy enforcement
-
Use Managed Services
-
Every Engineer should be a Cloud Engineer
-
Keep Scalabity in mind but don't overdo it
- Does everything have to automatically scale
- Know the upper bound
- Gather usage data, continue testing and planning
- Identify limiting factors e.g. database
- Ability of code to take advantage of more CPU/Memory
-
Containers aren't magic
- Use container scanning services to periodically check the contents of image for latest known vulnerabilities
- Create secure containers/run in strict container sandbox
-
Re-Platform every 5 to 10 years
- Increased velocity
- Future proofing
- Scalability
- Security
- Efficiency
- Community backing
-
Visualize distributed systems
-
Serverless bad practices
- Deploying lot of functions - increases size, complexity and maintenance
- Calling function asynchronously - asynchronous calls increase the complexity of a system. Costs will increase, as a response channel and a serverless message queue will be required
- Employing many libraries - increases warm up time
- Using many technologies - requires people with skills in all of them
- Not documenting functions
-
Topology
- Modularity - separation of concerns
- Deployment Strategy e.g. Canary, blue/green
- Datacenter affinity - active/active, active/passive
-
Understand how services work under the hood - e.g. lambda cold start times, running out of IP's
-
Failing cloud migration
- Not optimizing for the cloud - doing just lift and shift
- Lack of architectural strategy - downtime management, latency, data migration etc.
Antipatterns
- Wild west - Each BU buying their own logging, monitoring solutions, differing CI/CD workflows
- Command & Control - Ticketing process for cloud
-
Security is Essential
-
Automation is required
-
Secrets
- Know where the secrets are kept
- Audit secrets - rotation, revocation
- Encryption
-
Never take a single region dependency
- Redundant storage
- AZ
- Backup
- Recovery
- Failovers should be automatic
- Practice failovers
-
Monitoring with Vizualizations/Dashboards
-
Incident Analysis and Chaos Engineering
-
Monitoring
- Functionality
- Usage patterns
- User Experience
- Security
- Billing
- Health Status
-
KISS It
- Avoid pre-mature optimizations
- Start small and use MVP's to guide design decisions
- Read documentation - pay attention to limits and error codes
- Focus on learning best practices
- Use standard naming conventions
- Delete unused cloud resources to remove clutter
- Find system failure scenarios and provide runbooks
-
Maintain Service Levels with Feature flags and circuit breakers
-
Design First, then code
-
Strategies to cope with duplicates
- Stateless consumers
- Keeping state - use TTL
-
Avoid big re-writes Risks
- Not making deadlines
- Going over budget
- Burning out team members
- Losing stakeholder confidence Steps
- Be realistic
- Utilize strangler pattern - incrementally modify an existing system by extracting parts of it gradually
- Repeat
-
QA is also feedback, early feedback
-
FinOps
- Make finance and procurement part of the planning process
- Provide guardrails for shared financial accountability
- Design and architect with finance in mind
- Use financial tracing to align cloud spending to product and customer metrics
- Provide real-time visibility of cloud spending for consuming teams
How it can be done
- Offer visibility for everyone
- Identify your cost drivers and metrics
- Have guardrails for the cost-control policy
-
Make sure you are watching and measuring your costs
-
Set billing alarms
-
Leverage a content delivery network
-
Stay within an availability zone (or region) in places where you are not looking to improve availability. Needless region-to-region costs are a killer
-
Leverage data compression
-
Have an effective tagging strategy
-
Place the accountability (and budget) for network charges on your application and solution teams
-
For compute and storage, utilize reserved resources/instances
-
Moving to microservices? Think about network, storage, and monitoring costs
-
Treat Your Infrastructure like Software
-
Focus on Your Team, Not on the Cost
- What is the impact of valuable/senior employees resigning because they are not comfortable with the new stack?
- What is your training session budget?
- What is the impact of having no productivity during the training session?
- How do you manage the lack of code quality, reliability, performance, and productivity because of the new stack?
Effectively Navigating Organizational Politics
- Who was (or was not) involved in making the decision?
- Who does this decision impact?
- What did that pro‐cess look like?
- Are the decision makers perceived as having the authority to make this decision?
- Who is accountable for the impact of this decision?