Dremio incident

Query engines not scaling up

Major Resolved View vendor source →

Dremio experienced a major incident on May 10, 2022 affecting Site Availability, lasting 1h 16m. The incident has been resolved; the full update timeline is below.

Started
May 10, 2022, 05:32 AM UTC
Resolved
May 10, 2022, 06:49 AM UTC
Duration
1h 16m
Detected by Pingoru
May 10, 2022, 05:32 AM UTC

Affected components

Site Availability

Update timeline

  1. investigating May 10, 2022, 05:32 AM UTC

    Query engines are not scaling up on demand leading to query failures. Engines that are already running will continue to server queries without issue. We are currently investigating the engine scaling issue.

  2. investigating May 10, 2022, 06:21 AM UTC

    We are continuing to investigate the issue. One potential mitigation has been applied. We are evaluating the effectiveness of the change.

  3. monitoring May 10, 2022, 06:40 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. monitoring May 10, 2022, 06:41 AM UTC

    A mitigation has been applied and appears successful. Monitoring for rebound.

  5. resolved May 10, 2022, 06:49 AM UTC

    This incident has been resolved. Engine scaling working normally.