Qubole incident

azure.qubole.com is down due to azure storage issues

Major Resolved View vendor source →

Qubole experienced a major incident on September 18, 2020 affecting Command Processing, lasting 7h 52m. The incident has been resolved; the full update timeline is below.

Started
Sep 18, 2020, 01:51 PM UTC
Resolved
Sep 18, 2020, 09:43 PM UTC
Duration
7h 52m
Detected by Pingoru
Sep 18, 2020, 01:51 PM UTC

Affected components

Command Processing

Update timeline

  1. investigating Sep 18, 2020, 01:51 PM UTC

    Qubole DevOps is investigating an intermittent slowness issue with command processing on azure.qubole.com

  2. investigating Sep 18, 2020, 02:29 PM UTC

    We are continuing to investigate the issue with Azure support team and early data indicates some issue with Azure Database for MySQL.

  3. investigating Sep 18, 2020, 03:07 PM UTC

    We are continuing to investigate the issue with Azure support team and its confirmed there is slowness on Azure Database for MySQL.

  4. investigating Sep 18, 2020, 03:23 PM UTC

    Azure support confirmed an issue with storage on us-east-1 affecting underlying services like RDS. For details please refer to https://status.azure.com/en-us/status

  5. investigating Sep 18, 2020, 03:35 PM UTC

    Azure support confirmed an issue with storage on us-east-1 affecting underlying services like RDS. For details please refer to https://status.azure.com/en-us/status

  6. investigating Sep 18, 2020, 03:47 PM UTC

    The issue seems to have worsened and its impacting the availability of the complete platform along with cluster / command operations. Qubole DevOps is working with Azure support team for restoring the same.

  7. identified Sep 18, 2020, 05:28 PM UTC

    Qubole DevOps and Azure support are continuing to work on restoring services as their primary region EAST US is experiencing major issues with many of their primary services. While we do not have an ETA from Azure support for service restoration in the region, we are working on 2 alternative approaches: 1. Move our application to another region (started, but is impacted by the current region slowness) 2. Creating a new RDS instance which may not be affected the same as existing instances We will keep you updated on this as we make further progress as we are driving clarity on a timeline for those two alternatives.

  8. monitoring Sep 18, 2020, 06:35 PM UTC

    Qubole DevOps has been working with Azure support and services appear to be returning to normal through our operational monitoring. We are verifying services with the Qubole application and will update if there are further issues discovered. You may resume your normal operations at this time.

  9. resolved Sep 18, 2020, 09:43 PM UTC

    Qubole DevOps has verified the environment is functioning within normal operating guidelines with no new alerts. We are resolving this incident.