Gemfury incident

Upload processing delays

Minor Resolved View vendor source →

Gemfury experienced a minor incident on August 20, 2020 affecting Uploads, lasting 9h 52m. The incident has been resolved; the full update timeline is below.

Started
Aug 20, 2020, 06:10 PM UTC
Resolved
Aug 21, 2020, 04:02 AM UTC
Duration
9h 52m
Detected by Pingoru
Aug 20, 2020, 06:10 PM UTC

Affected components

Uploads

Update timeline

  1. investigating Aug 20, 2020, 02:46 AM UTC

    We are investigating upload delays due to locked up background job workers.

  2. investigating Aug 20, 2020, 07:18 AM UTC

    Issue is quite difficult to reproduce. We have implemented additional instrumentation to capture the issue when it happens in production. We will keep this issue open over the next period of high traffic.

  3. identified Aug 20, 2020, 06:10 PM UTC

    We've found code that has terrible performance for packages with many versions. Multiple background jobs containing this code consume all the worker slots denying other jobs from being processed. (Edit: Jobs that bring your uploads to your dashboard and indexes.)

  4. monitoring Aug 20, 2020, 07:53 PM UTC

    We've sped up one hot path by about 75%, and provisioned larger workers. Still seeing delays.

  5. monitoring Aug 20, 2020, 08:22 PM UTC

    We're starting to see some progress. Workers keeping up with incoming work. Still monitoring

  6. resolved Aug 21, 2020, 04:02 AM UTC

    We were able to stabilize uploads, and we'll keep this as an area of focus. Resolving.