Skip to content

Instantly share code, notes, and snippets.

@kelapure
Last active August 29, 2015 14:24
Show Gist options
  • Save kelapure/2e3838d6179f761ed04c to your computer and use it in GitHub Desktop.
Save kelapure/2e3838d6179f761ed04c to your computer and use it in GitHub Desktop.
Failure Scenario testing for Cloud Foundry

Resiliency Testing for Cloud Foundry

Application Failure Scenarios

  1. [HEAP-OOM] App instance allocates excess Java Heap Memory. App repetitively suffers java heap OOM.
  2. [NATIVE-OOM] App instance allocates excess native memory leading to crashes of the warden container.
  3. [EXCESS-CPU] App instance is pegged at > 90% CPU for a sustained period of time.
  4. [DEADLOCK] App instance is pegged < 50% CPU inspite of sustained increasing request traffic.
  5. [GC-PAUSES] App unresponsive for random periods of time.
  6. [RANDOM-DEATH] App instance randomly exits.

Service Failure Scenarios

  1. Logging Infrastructure service is unresponsive / not accessible
  2. App Dynamics Monitoring Service is unresponsive / not accessible
  3. Other system infrastructure services are not available

Component Failure Scenarios

  1. [CF-STATELESS] Random failure of critical CF stateless components
  2. [CF-STATEFUL] Random storage volumes (persistence stores) in a deployment.
  3. [AZ-LOSS] Loss of an entire AZ

Hardware Failure Scenarios

  1. [NETWORK-PARTITION] Entire ESXi Hypervisor hosting DEAs and other CF components goes offline.
  2. [HYPERVISOR-DEATH] Entire ESXi Hypervisor hosting DEAs and other CF components dies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment