Qubole incident

AWS API issue and potential capacity issue for workloads

Minor Resolved View vendor source →

Qubole experienced a minor incident on April 15, 2021 affecting Command Processing and Command Processing, lasting 13d 22h. The incident has been resolved; the full update timeline is below.

Started
Apr 15, 2021, 02:13 PM UTC
Resolved
Apr 29, 2021, 01:06 PM UTC
Duration
13d 22h
Detected by Pingoru
Apr 15, 2021, 02:13 PM UTC

Affected components

Command ProcessingCommand Processing

Update timeline

  1. investigating Apr 15, 2021, 02:13 PM UTC

    Devops is currently investigating a potential capacity issue on api.qubole.com that may cause some workloads to stall or fail. Amazon reported an outage on AWS on 4/15/2021 that may be a contributor and which they have since resolved. Devops is still researching the Qubole environment, ongoing, to determine if further correction is needed.

  2. investigating Apr 22, 2021, 01:47 PM UTC

    We are continuing to investigate this issue.

  3. investigating Apr 22, 2021, 01:55 PM UTC

    We are continuing to investigate this issue.

  4. monitoring Apr 23, 2021, 02:55 PM UTC

    Devops has implemented a change as a part of the fix for the Notebooks outage that they believe will also resolve this issue. They are currently monitoring and testing API calls to ensure that all use cases have been covered.

  5. monitoring Apr 27, 2021, 08:49 PM UTC

    Devops is still reviewing a few remaining concerns to ensure this issue is fully resolved.

  6. resolved Apr 29, 2021, 01:06 PM UTC

    Devops considers the issue resolved at this time.