Skip to content

Instantly share code, notes, and snippets.

@HarshadRanganathan
Last active March 21, 2022 01:37
Show Gist options
  • Save HarshadRanganathan/ff00a7ec696ef9cba1dc10dbf6a8cd02 to your computer and use it in GitHub Desktop.
Save HarshadRanganathan/ff00a7ec696ef9cba1dc10dbf6a8cd02 to your computer and use it in GitHub Desktop.
Things Cloud Engineers Should Know - Redhat
  • Multi Cloud Decisions

    Key Enablers

    • Workload Portability
    • Ability to negotiate with suppliers
    • Ability to select best tool for a given job

    Keys

    • Visibility - trusted single source of truth
    • Efficiency - across dev, qa, security and operations
    • Governance - automated security, code quality, vulnerability management, policy enforcement
  • Use Managed Services

  • Every Engineer should be a Cloud Engineer

  • Keep Scalabity in mind but don't overdo it

    • Does everything have to automatically scale
    • Know the upper bound
    • Gather usage data, continue testing and planning
    • Identify limiting factors e.g. database
    • Ability of code to take advantage of more CPU/Memory
  • Containers aren't magic

    • Use container scanning services to periodically check the contents of image for latest known vulnerabilities
    • Create secure containers/run in strict container sandbox
  • Re-Platform every 5 to 10 years

    • Increased velocity
    • Future proofing
    • Scalability
    • Security
    • Efficiency
    • Community backing
  • Visualize distributed systems

  • Serverless bad practices

    • Deploying lot of functions - increases size, complexity and maintenance
    • Calling function asynchronously - asynchronous calls increase the complexity of a system. Costs will increase, as a response channel and a serverless message queue will be required
    • Employing many libraries - increases warm up time
    • Using many technologies - requires people with skills in all of them
    • Not documenting functions
  • Topology

    • Modularity - separation of concerns
    • Deployment Strategy e.g. Canary, blue/green
    • Datacenter affinity - active/active, active/passive
  • Understand how services work under the hood - e.g. lambda cold start times, running out of IP's

  • Failing cloud migration

    • Not optimizing for the cloud - doing just lift and shift
    • Lack of architectural strategy - downtime management, latency, data migration etc.

    Antipatterns

    • Wild west - Each BU buying their own logging, monitoring solutions, differing CI/CD workflows
    • Command & Control - Ticketing process for cloud
  • Security is Essential

  • Automation is required

  • Secrets

    • Know where the secrets are kept
    • Audit secrets - rotation, revocation
    • Encryption
  • Never take a single region dependency

    • Redundant storage
    • AZ
    • Backup
    • Recovery
    • Failovers should be automatic
    • Practice failovers
  • Monitoring with Vizualizations/Dashboards

  • Incident Analysis and Chaos Engineering

  • Monitoring

    • Functionality
    • Usage patterns
    • User Experience
    • Security
    • Billing
    • Health Status
  • KISS It

    • Avoid pre-mature optimizations
    • Start small and use MVP's to guide design decisions
    • Read documentation - pay attention to limits and error codes
    • Focus on learning best practices
    • Use standard naming conventions
    • Delete unused cloud resources to remove clutter
    • Find system failure scenarios and provide runbooks
  • Maintain Service Levels with Feature flags and circuit breakers

  • Design First, then code

  • Strategies to cope with duplicates

    • Stateless consumers
    • Keeping state - use TTL
  • Avoid big re-writes Risks

    • Not making deadlines
    • Going over budget
    • Burning out team members
    • Losing stakeholder confidence Steps
    • Be realistic
    • Utilize strangler pattern - incrementally modify an existing system by extracting parts of it gradually
    • Repeat
  • QA is also feedback, early feedback

  • FinOps

    • Make finance and procurement part of the planning process
    • Provide guardrails for shared financial accountability
    • Design and architect with finance in mind
    • Use financial tracing to align cloud spending to product and customer metrics
    • Provide real-time visibility of cloud spending for consuming teams

    How it can be done

    • Offer visibility for everyone
    • Identify your cost drivers and metrics
    • Have guardrails for the cost-control policy
  • Make sure you are watching and measuring your costs

  • Set billing alarms

  • Leverage a content delivery network

  • Stay within an availability zone (or region) in places where you are not looking to improve availability. Needless region-to-region costs are a killer

  • Leverage data compression

  • Have an effective tagging strategy

  • Place the accountability (and budget) for network charges on your application and solution teams

  • For compute and storage, utilize reserved resources/instances

  • Moving to microservices? Think about network, storage, and monitoring costs

  • Treat Your Infrastructure like Software

  • Focus on Your Team, Not on the Cost

    • What is the impact of valuable/senior employees resigning because they are not comfortable with the new stack?
    • What is your training session budget?
    • What is the impact of having no productivity during the training session?
    • How do you manage the lack of code quality, reliability, performance, and productivity because of the new stack?

Effectively Navigating Organizational Politics

  • Who was (or was not) involved in making the decision?
  • Who does this decision impact?
  • What did that pro‐cess look like?
  • Are the decision makers perceived as having the authority to make this decision?
  • Who is accountable for the impact of this decision?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment