Stream experienced a minor incident on January 28, 2020 affecting us-east, lasting 46m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- monitoring Jan 28, 2020, 04:49 PM UTC
A recent released caused load increase on part of the chat infrastructure and caused degraded performance and timeout errors. Remediation is in progress.
- monitoring Jan 28, 2020, 04:56 PM UTC
We are continuing to monitor for any further issues.
- monitoring Jan 28, 2020, 04:57 PM UTC
We are continuing to monitor for any further issues.
- resolved Jan 28, 2020, 04:58 PM UTC
This incident has been resolved.
- postmortem Jan 28, 2020, 04:59 PM UTC
Between 4:05PM and 4:45PM UTC on January 28 2020 we had an API outage caused by performance degradation. The event was triggered by a new release to our Chat API servers; quickly after the new release was live, load on our database infrastructure increased and caused HTTP response times to spike and time-out in some cases. The event was detected by our latency and error monitoring. The team started working on the event by rolling back to the previous version at 4:20PM UTC. Unfortunately the rollback did not resolve the problem entirely. After another rollback attempt we realised there were still pending queries from the previous release running on our PostgreSQL database. We manually terminated all the pending tasks at 4:40PM UTC; after that the error rate dropped to 0% again. The outage affected 5% of HTTP requests at its peak \(4:20PM to 4:27PM UTC\).