Buildkite incident

Increased error rates from Test Plan API

Major Resolved View vendor source →

Buildkite experienced a major incident on March 10, 2026 affecting REST API, lasting 8h 13m. The incident has been resolved; the full update timeline is below.

Started
Mar 10, 2026, 01:21 AM UTC
Resolved
Mar 10, 2026, 09:34 AM UTC
Duration
8h 13m
Detected by Pingoru
Mar 10, 2026, 01:21 AM UTC

Affected components

REST API

Update timeline

  1. investigating Mar 10, 2026, 01:21 AM UTC

    We've observed periodic test splitting plan timing out and falling back to non-intelligent splitting. Performance appears to be back to normal as of an hour ago. We are continuing to investigate the root cause and solve the underlying issue.

  2. monitoring Mar 10, 2026, 02:25 AM UTC

    We have implemented several mitigation and continue working on fixing the underlying cause. Our team is actively monitoring the situation to ensure the stability. We will provide further updates as we make progress on resolving this issue.

  3. resolved Mar 10, 2026, 09:34 AM UTC

    Our mitigations have resolved the elevated latency and likelihood of suboptimal fallback test plans. We have also identified and fixed a blind-spot in our automated alerting, which was previously unable to detect this scenario as an issue. Work continues this week to resolve the underlying performance issue by restructuring how the relevant data is ingested and accessed.