CloudRepo incident

Package Repository Outage

Critical Resolved View vendor source →

CloudRepo experienced a critical incident on May 10, 2018, lasting —. The incident has been resolved; the full update timeline is below.

Started
May 10, 2018, 05:00 PM UTC
Resolved
May 10, 2018, 05:00 PM UTC
Duration
Detected by Pingoru
May 10, 2018, 05:00 PM UTC

Update timeline

  1. resolved May 16, 2018, 10:38 AM UTC

    Customer Impact: Access to our storage APIs (publishing/reading packages) was returning 500 errors for some partners. This is a repeat of the May 9th outage - please refer to the incident summary for more details. Resolution: After we were alerted of this issue we were able to restore functionality to all partners. Duration: Approximately 45 minutes at approximately 11:00 CST and 20 minutes around 18:00 CST. Future Mitigation: To prevent this from happening again, we will be implementing several changes: 1) Improve our monitoring to detect 500 errors as soon as they occur. 2) Increased the size of our cluster to give us more headroom in our connection pools. 3) Continue to investigate root cause and fix anything that may be holding on to connections.