Qubole incident

Unable to start clusters

Critical Resolved View vendor source →

Qubole experienced a critical incident on April 22, 2021 affecting QDS API and Command Processing and 1 more component, lasting 5d 7h. The incident has been resolved; the full update timeline is below.

Started
Apr 22, 2021, 12:54 PM UTC
Resolved
Apr 27, 2021, 08:23 PM UTC
Duration
5d 7h
Detected by Pingoru
Apr 22, 2021, 12:54 PM UTC

Affected components

QDS APICommand ProcessingQubole SchedulerCluster Operations

Update timeline

  1. investigating Apr 22, 2021, 12:54 PM UTC

    Devops is investigating the general inability to start new clusters.

  2. investigating Apr 22, 2021, 02:56 PM UTC

    We are continuing to investigate this issue.

  3. monitoring Apr 23, 2021, 06:08 PM UTC

    Devops is monitoring its latest fix -- this should be resolved. Additional information about the resolution will be added after monitoring.

  4. resolved Apr 27, 2021, 08:23 PM UTC

    Devops ran into a dependency issue that prevented expiring nodes from going offline and being replaced. In addition to manually replacing nodes, the dependency was resolved, which will allow older nodes to expire automatically going forward.