Skylight incident

Data Processing Delay

Minor Resolved View vendor source →

Skylight experienced a minor incident on May 28, 2019 affecting Application, lasting 6h 28m. The incident has been resolved; the full update timeline is below.

Started
May 28, 2019, 08:04 PM UTC
Resolved
May 29, 2019, 02:33 AM UTC
Duration
6h 28m
Detected by Pingoru
May 28, 2019, 08:04 PM UTC

Affected components

Application

Update timeline

  1. identified May 28, 2019, 08:04 PM UTC

    One of our data processing worker is experiencing high load. We are working on provisioning additional resources. In the meantime, around 20% of customer apps may experience up to an hour of data processing delay.

  2. identified May 28, 2019, 09:08 PM UTC

    We are suspending data processing on the affected worker in preparation of deploying the upgraded infrastructure.

  3. monitoring May 28, 2019, 09:21 PM UTC

    We have deployed the upgraded infrastructure. Data processing should gradually catch up over the next few hours for the affected customer apps (~20%). We apologize for the inconvenience.

  4. identified May 29, 2019, 12:23 AM UTC

    Data processing for most customer apps has resumed to normal as of 16:00 Pacific Time. Unfortunately, one of the new workers (affecting around 10% of customer apps) is still having issues. We are taking further actions to address the underlying issue.

  5. monitoring May 29, 2019, 01:01 AM UTC

    The remaining worker has resumed processing data. We will be monitoring its progress.

  6. resolved May 29, 2019, 02:33 AM UTC

    The remaining worker has finished processing the backlog as of 19:16 Pacific Time.