Skip to content

Instantly share code, notes, and snippets.

@mjallday
Created March 15, 2015 22:14
Show Gist options
  • Save mjallday/f07618417d1920f4e20e to your computer and use it in GitHub Desktop.
Save mjallday/f07618417d1920f4e20e to your computer and use it in GitHub Desktop.
Balanced Partial Outage Post Mortem - 2015-03-15

Balanced experienced a partial outage that affected 25% of card processing transactions between 8:40AM and 9:42AM this morning due to a degraded machine which was not correctly removed from the load balancer.

The core of the issue was in our secure vault system, which handles storage and retrieval of sensitive card data. One of the machines stopped sending messages, which cause some requests to be queued up but not processed but our automated health checks did not flag the machine as unhealthy.

We found the root cause of the issue and have resolved the problem to ensure that it will not re-occur.

We're dedicated to providing high-quality support and a culture of openness during the migration process. If you continue to experience issues, please email us at [email protected] or tweet at @balancedstatus.

@matin
Copy link

matin commented Mar 16, 2015

@taylorbrooks: Are you still experiencing any issues? The dashboard should be working fine. I just checked on your marketplace in particular to verify.

@msherry
Copy link

msherry commented Mar 16, 2015

Transactions created after the end of the outage showed up correctly. I am manually reingesting any transactions created during the outage, as some of them didn't show up correctly in the dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment