Qubole experienced a critical incident on April 22, 2021 affecting QDS API and Command Processing and 1 more component, lasting 5d 7h. The incident has been resolved; the full update timeline is below.
Affected components
QDS APICommand ProcessingQubole SchedulerCluster Operations
Update timeline
- investigating Apr 22, 2021, 12:54 PM UTC
Devops is investigating the general inability to start new clusters.
- investigating Apr 22, 2021, 02:56 PM UTC
We are continuing to investigate this issue.
- monitoring Apr 23, 2021, 06:08 PM UTC
Devops is monitoring its latest fix -- this should be resolved. Additional information about the resolution will be added after monitoring.
- resolved Apr 27, 2021, 08:23 PM UTC
Devops ran into a dependency issue that prevented expiring nodes from going offline and being replaced. In addition to manually replacing nodes, the dependency was resolved, which will allow older nodes to expire automatically going forward.