Qubole incident

Cluster failure on the wellness.qubole.com

Critical Resolved View vendor source →

Qubole experienced a critical incident on April 19, 2021 affecting Cluster Operations, lasting 3d. The incident has been resolved; the full update timeline is below.

Started
Apr 19, 2021, 05:10 PM UTC
Resolved
Apr 22, 2021, 05:45 PM UTC
Duration
3d
Detected by Pingoru
Apr 19, 2021, 05:10 PM UTC

Affected components

Cluster Operations

Update timeline

  1. investigating Apr 19, 2021, 05:10 PM UTC

    Clusters in the wellness.qubole.com environment are offline. Support and Devops are investigating the cause of this issue at a critical priority, and will update this page with more information as it is available.

  2. investigating Apr 19, 2021, 05:42 PM UTC

    Additional detail: Some clusters in the environment appear to be running, but are not accepting commands through airflow. Cluster start/termination responses appear to be inconsistent. Devops is reviewing.

  3. monitoring Apr 22, 2021, 04:58 PM UTC

    Devops believes they have identified an issue causing clusters not to start, and is monitoring their latest change to see if it resolves the issue.

  4. resolved Apr 22, 2021, 05:45 PM UTC

    This incident has been resolved.