Dremio incident

Intermittent Query Failures

Major Resolved View vendor source →

Dremio experienced a major incident on September 7, 2023 affecting Site Availability and Site Availability, lasting 1d 1h. The incident has been resolved; the full update timeline is below.

Started
Sep 07, 2023, 03:50 PM UTC
Resolved
Sep 08, 2023, 05:40 PM UTC
Duration
1d 1h
Detected by Pingoru
Sep 07, 2023, 03:50 PM UTC

Affected components

Site AvailabilitySite Availability

Update timeline

  1. investigating Sep 07, 2023, 03:50 PM UTC

    Following the deployment on Wednesday, Sept 6th, 2023 our monitoring has detected an increase in query failure rates for some projects. We believe this is only impacting AWS projects created since Aug 17th, 2023. If you are experiencing query failures, please contact Dremio Support and they can assist in mitigation. We believe the issue has been identified and a fix is being evaluated. We will update further when the fix is deployed.

  2. identified Sep 07, 2023, 03:50 PM UTC

    The issue has been identified and a fix is being implemented.

  3. identified Sep 08, 2023, 12:31 AM UTC

    We are moving the mitigation through our testing environments in preparation of deploying to our production systems.

  4. identified Sep 08, 2023, 03:55 PM UTC

    A fix has been deployed to the US site and is being evaluated. Initial reports are showing increased latency in the UI. We will update as we consider next steps.

  5. monitoring Sep 08, 2023, 04:14 PM UTC

    The fix deployed to the US site is performing well now. We will continue to monitor and then roll out the same fix to the European site.

  6. monitoring Sep 08, 2023, 05:09 PM UTC

    The fix has been deployed to both production environments and normal operation has resumed. We will continue to monitor before resolving the incident.

  7. resolved Sep 08, 2023, 05:40 PM UTC

    This incident has been resolved.