Qubole incident

Degraded performance issue Airflow clusters on api.q

Minor Resolved View vendor source →

Qubole experienced a minor incident on February 16, 2023 affecting Cluster Operations, lasting 1d. The incident has been resolved; the full update timeline is below.

Started
Feb 16, 2023, 02:36 PM UTC
Resolved
Feb 17, 2023, 02:56 PM UTC
Duration
1d
Detected by Pingoru
Feb 16, 2023, 02:36 PM UTC

Affected components

Cluster Operations

Update timeline

  1. investigating Feb 16, 2023, 02:36 PM UTC

    For customers using Airflow, there is an issue with clusters that are running. All other aspects of Qubole operations are functioning normally. These Airflow clusters do not automatically terminate due to the nature of their function. Any currently running Airflow clusters are functioning normally. If an Airflow cluster is terminated or restarted for any reason then it will not come back up as the loading of Airflow will fail. We are currently working to resolve the issue and will post an update in the next 2 hours or earlier if new information is available or service is restored.

  2. investigating Feb 16, 2023, 05:18 PM UTC

    We are investigating options to resolve the issue. We will continue to post the status.

  3. identified Feb 16, 2023, 10:15 PM UTC

    The library used by airflow relies on a common python package which was recently upgraded in the open-source community and is causing the breaking changes. In the meantime, we are exploring ways to bring a resolution into QDS airflow that serves as a workaround for this upstream issue.

  4. monitoring Feb 17, 2023, 04:19 AM UTC

    We would inform that we have implemented a preliminary fix and Airflow clusters are now coming up successfully. Customers may experience longer than normal start up times while we work on continued maintenance. Please reach out to Support if you experience any failures with starting Airflow clusters.

  5. resolved Feb 17, 2023, 02:56 PM UTC

    We have implemented a preliminary fix and Airflow clusters are now coming up successfully. We will continue to investigate a permanent fix for the issue. Customers may experience longer than normal start up times while we work on continued maintenance. Please reach out to Support if you experience any failures with starting Airflow clusters.