InfluxData incident

Elevated Query Error Rate

Major Resolved View vendor source →

InfluxData experienced a major incident on June 13, 2025 affecting API Queries and Tasks, lasting 6h. The incident has been resolved; the full update timeline is below.

Started
Jun 13, 2025, 04:13 PM UTC
Resolved
Jun 13, 2025, 10:14 PM UTC
Duration
6h
Detected by Pingoru
Jun 13, 2025, 04:13 PM UTC

Affected components

API QueriesTasks

Update timeline

  1. investigating Jun 13, 2025, 04:13 PM UTC

    We have observed an elevated query error rate and are investigating.

  2. identified Jun 13, 2025, 05:06 PM UTC

    The issue has been identified and a fix is being put into place.

  3. monitoring Jun 13, 2025, 05:39 PM UTC

    A fix has been implemented and deployed. We are monitoring at this time.

  4. monitoring Jun 13, 2025, 07:09 PM UTC

    We are continuing to closely monitor the health of the query API

  5. monitoring Jun 13, 2025, 08:36 PM UTC

    We are continuing to monitor this region for any further degradation

  6. resolved Jun 13, 2025, 10:14 PM UTC

    This incident has been resolved.

  7. postmortem Jul 17, 2025, 08:26 PM UTC

    In a multi-tenant environment, such as Cloud 2, there can be unpredictable changes in workload, as different customers on the shared platform may increase or decrease their activity on the cluster with no advance notice. While we try to shelter customers from one another, via rate limits and some partitioning, there is no way to fully protect customers from the impact of a "noisy neighbor". We maintain excess capacity in the cluster to provide some amount of headroom for increased workloads, and we also have rate limits to ensure that one customer does not overwhelm the cluster \(via writes, deletes or queries\), but the rate limits are not very fine-grained, and at times, a single customer can stress the cluster. When that occurs, the team is automatically notified, and we add additional resources to accommodate the extra workload. It can sometimes take time for the extra capacity to be fully deployed, leading to temporary performance problems. This is what occurred in the Azure eu-west cluster on June 13th. We were notified that query performance was degraded and the team increased the storage pods to provide extra capacity. Once the storage pods limits were fully applied, the query TTBR recovered. We apologize for the inconvenience.