Summary of incident affecting the Ably production cluster 3 January 2019 - preliminary investigation and conclusions
See the incident on our status site at https://status.ably.io/incidents/574
There was an incident affecting the Ably production cluster on 3 January that caused a significant level of disruption for multiple accounts, primarily in the us-east-1 region.
The incident was most severe for a 2-hour period between 1520 and 1720 (all times in UTC) in which there were elevated error rates across all regions, with the greatest impact in us-east-1. During that time, channel attach error rates averaged nearly 50%, and message publish error rates averaged around 42%. For a further period of 2½ hours, as the system recovered, error rates were also escalated; taking the entire period of the incident, channel attach error rates averaged around 33% and message publish error rates averaged 22%.