CloudRepo incident

Package Repository Outage

Critical Resolved View vendor source →

CloudRepo experienced a critical incident on May 9, 2018, lasting —. The incident has been resolved; the full update timeline is below.

Started
May 09, 2018, 05:00 PM UTC
Resolved
May 09, 2018, 05:00 PM UTC
Duration
Detected by Pingoru
May 09, 2018, 05:00 PM UTC

Update timeline

  1. resolved May 16, 2018, 10:06 AM UTC

    Customer Impact: Access to our storage APIs (publishing/reading packages) was returning 500 errors for some partners. Root Cause: Our servers exhausted their connections to the storage layer and our monitoring system did not alert us to this degraded state - a partner alerted us instead. Resolution: After we were alerted of this issue we were able to restore functionality to all partners. Duration: Approximately two hours Future Mitigation: To prevent this from happening again, we will be implementing several changes: 1) Improve our monitoring to detect 500 errors as soon as they occur. 2) Increased the size of our cluster to give us more headroom in our connection pools. 3) Continue to investigate root cause and fix anything that may be holding on to connections.