Buildkite Outage History

Buildkite is up right now

Buildkite had 45 outages in the last 2 years totaling 84h 12m of downtime — averaging 1.8 incidents per month.

There were 45 Buildkite outages since June 30, 2025 totaling 84h 12m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://www.buildkitestatus.com

Major May 20, 2026

Delayed notifications

Detected by Pingoru
May 20, 2026, 04:40 PM UTC
Resolved
May 20, 2026, 05:39 PM UTC
Duration
59m
Affected: GitHub Commit Status NotificationsEmail NotificationsSlack NotificationsWebhook Notifications
Timeline · 5 updates

Read the full incident report →

Minor May 15, 2026

Delayed Test Engine ingestion processing

Detected by Pingoru
May 15, 2026, 06:51 AM UTC
Resolved
May 15, 2026, 07:35 AM UTC
Duration
44m
Affected: Ingestion
Timeline · 2 updates
  1. monitoring May 15, 2026, 06:51 AM UTC

    Ingestion of Test Engine execution data from an internal queue to a data store stalled, has been resumed, and is working through the backlog. Visibility of test executions from the past hour hours will be delayed for approximately a further one hour. This has been a recurring issue; an architectural change is coming soon to eliminate this failure mode.

  2. resolved May 15, 2026, 07:35 AM UTC

    Processing of the backlog is complete.

Read the full incident report →

Minor May 13, 2026

Error rates increasing

Detected by Pingoru
May 13, 2026, 03:14 PM UTC
Resolved
May 13, 2026, 03:34 PM UTC
Duration
19m
Affected: WebREST APIRemote MCP Server
Timeline · 2 updates
  1. investigating May 13, 2026, 03:14 PM UTC

    We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

  2. resolved May 13, 2026, 03:34 PM UTC

    Additional capacity was added to our redis caches. This triggered a failover between UTC 15:10 - 15:14 and there was a spike of errors on the REST and GraphQL APIs. Customers would have seen some errors in the Buildkite UI during this period as well. We have been monitoring the situation since then and things have returned to baseline.

Read the full incident report →

Minor May 6, 2026

Test Engine: Delayed processing of test result ingestion

Detected by Pingoru
May 06, 2026, 03:57 AM UTC
Resolved
May 06, 2026, 05:26 AM UTC
Duration
1h 28m
Affected: Ingestion
Timeline · 3 updates
  1. investigating May 06, 2026, 03:57 AM UTC

    A process writing test results to our Test Engine data store stalled, we've restarted the process and are seeing it catching up. We expect to be fully caught up on the backlog within the next couple of hours.

  2. monitoring May 06, 2026, 04:21 AM UTC

    We've identified the issue and the system is currently processing the backlog of test executions

  3. resolved May 06, 2026, 05:26 AM UTC

    Processing of test execution ingestion data has successfully caught up.

Read the full incident report →

Minor May 4, 2026

Increased latency and error rates

Detected by Pingoru
May 04, 2026, 06:02 AM UTC
Resolved
May 04, 2026, 06:30 AM UTC
Duration
27m
Affected: Agent API
Timeline · 2 updates
  1. investigating May 04, 2026, 06:02 AM UTC

    We're observing increased latency and error rates in the Agent API for a subset of our customers. We're currently investigating and will provide status updates as they become available.

  2. resolved May 04, 2026, 06:30 AM UTC

    An increase in requests has lead to the API service being temporarily saturated. We have updated rate limits to ensure this doesn't re-occur and will add further resources if necessary

Read the full incident report →

Minor April 28, 2026

Increased dispatch latency and error rates

Detected by Pingoru
Apr 28, 2026, 06:00 PM UTC
Resolved
Apr 28, 2026, 07:16 PM UTC
Duration
1h 15m
Affected: Hosted Agents
Timeline · 4 updates
  1. investigating Apr 28, 2026, 06:00 PM UTC

    We're observing increased error rates and dispatch latency for a subset of our customers. We're currently investigating and will provide status updates as they become available.

  2. identified Apr 28, 2026, 06:26 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Apr 28, 2026, 06:45 PM UTC

    We have mitigated the issue causing increased Hosted Agents dispatch latency and intermittent timeout errors for a subset of customers. We identified abnormal workload activity that was placing elevated load on a supporting service, and have now blocked that activity and applied additional protections. Service metrics have returned to normal, and we are continuing to monitor closely.

  4. resolved Apr 28, 2026, 07:16 PM UTC

    Previously elevated loads with Hosted Agents dispatch have fully recovered.

Read the full incident report →

Minor April 22, 2026

Auth failures with remote MCP server

Detected by Pingoru
Apr 22, 2026, 09:19 PM UTC
Resolved
Apr 22, 2026, 10:59 PM UTC
Duration
1h 40m
Timeline · 4 updates
  1. investigating Apr 22, 2026, 09:19 PM UTC

    We are currently investigating reports of authentication failures with the remote MCP server.

  2. investigating Apr 22, 2026, 10:07 PM UTC

    We are continuing to investigate errors when authenticating to the remote MCP server.

  3. monitoring Apr 22, 2026, 10:44 PM UTC

    We have rolled back a change on the remote MCP server that was contributing to authentication failures.

  4. resolved Apr 22, 2026, 10:59 PM UTC

    The issue is resolved.

Read the full incident report →

Minor April 22, 2026

Delayed processing of test execution

Detected by Pingoru
Apr 22, 2026, 02:32 AM UTC
Resolved
Apr 22, 2026, 05:07 AM UTC
Duration
2h 35m
Affected: Ingestion
Timeline · 2 updates
  1. monitoring Apr 22, 2026, 02:32 AM UTC

    We noticed a lag in data processing, but our systems are operational and currently working through the backlog. We expect to be fully caught up within the next couple of hours.

  2. resolved Apr 22, 2026, 05:07 AM UTC

    The backlog has been cleared and all systems are fully operational. Thank you for your patience.

Read the full incident report →

Minor March 31, 2026

Hosted Agents jobs immediately cancelled

Detected by Pingoru
Mar 31, 2026, 07:51 AM UTC
Resolved
Mar 31, 2026, 08:34 AM UTC
Duration
43m
Affected: Hosted Agents
Timeline · 3 updates
  1. investigating Mar 31, 2026, 07:51 AM UTC

    We have received reports from customers that they are unable to start builds on Hosted Agents. Their builds are immediately cancelled. We are investigating.

  2. identified Mar 31, 2026, 08:15 AM UTC

    We have identified the issue and are rolling out a fix.

  3. resolved Mar 31, 2026, 08:34 AM UTC

    We have deployed the fix and we have confirmed customer builds are working. If you encounter any further issues please contact support.

Read the full incident report →

Minor March 27, 2026

504 errors viewing builds

Detected by Pingoru
Mar 27, 2026, 07:02 AM UTC
Resolved
Mar 27, 2026, 08:53 AM UTC
Duration
1h 51m
Affected: Web
Timeline · 4 updates
  1. investigating Mar 27, 2026, 07:02 AM UTC

    We're seeing an increase in 504 errors when viewing pipeline builds. We're investigating this now.

  2. identified Mar 27, 2026, 07:18 AM UTC

    We've identified a change which we think is the cause of this issue, and we're in the process of reverting it.

  3. monitoring Mar 27, 2026, 08:08 AM UTC

    The deploy to revert this change is complete and builds are loading normally. We will continue to monitor for any other issues.

  4. resolved Mar 27, 2026, 08:53 AM UTC

    The incident is now resolved. We are no longer seeing errors when viewing pipelines.

Read the full incident report →

Minor March 25, 2026

Increased Delays with Hosted Agents

Detected by Pingoru
Mar 25, 2026, 02:26 PM UTC
Resolved
Mar 25, 2026, 03:50 PM UTC
Duration
1h 24m
Affected: MacOSLinux (ARM64)Linux (AMD64)Hosted Agents
Timeline · 4 updates
  1. investigating Mar 25, 2026, 02:26 PM UTC

    We are currently investigating this issue.

  2. identified Mar 25, 2026, 02:30 PM UTC

    The issue has been identified to be related to Networking and affecting Git Mirror cloning.

  3. monitoring Mar 25, 2026, 02:59 PM UTC

    The networking issue has been resolved, dispatch of Hosted Agents has returned to normal levels and no further issues with Git cloning. We are monitoring the situation.

  4. resolved Mar 25, 2026, 03:50 PM UTC

    This incident is now resolved. We are no longer seeing further networking issues with Hosted Agents, which affected delays in creating them for Jobs, resolving external traffic and interactions with Cache - affecting Git Mirror Cloning.

Read the full incident report →

Major March 11, 2026

Increased queue times on hosted agents

Detected by Pingoru
Mar 11, 2026, 07:50 PM UTC
Resolved
Mar 11, 2026, 09:14 PM UTC
Duration
1h 24m
Affected: Hosted Agents
Timeline · 3 updates
  1. investigating Mar 11, 2026, 07:50 PM UTC

    We are investigating reports of elevated queue times with hosted agents.

  2. monitoring Mar 11, 2026, 08:44 PM UTC

    We identified increased demand affecting hosted agent queue times. We have added additional capacity and are seeing recovery of hosted agent queue times.

  3. resolved Mar 11, 2026, 09:14 PM UTC

    This incident has been resolved.

Read the full incident report →

Major March 10, 2026

Increased error rates from Test Plan API

Detected by Pingoru
Mar 10, 2026, 01:21 AM UTC
Resolved
Mar 10, 2026, 09:34 AM UTC
Duration
8h 13m
Affected: REST API
Timeline · 3 updates
  1. investigating Mar 10, 2026, 01:21 AM UTC

    We've observed periodic test splitting plan timing out and falling back to non-intelligent splitting. Performance appears to be back to normal as of an hour ago. We are continuing to investigate the root cause and solve the underlying issue.

  2. monitoring Mar 10, 2026, 02:25 AM UTC

    We have implemented several mitigation and continue working on fixing the underlying cause. Our team is actively monitoring the situation to ensure the stability. We will provide further updates as we make progress on resolving this issue.

  3. resolved Mar 10, 2026, 09:34 AM UTC

    Our mitigations have resolved the elevated latency and likelihood of suboptimal fallback test plans. We have also identified and fixed a blind-spot in our automated alerting, which was previously unable to detect this scenario as an issue. Work continues this week to resolve the underlying performance issue by restructuring how the relevant data is ingested and accessed.

Read the full incident report →

Minor March 7, 2026

Elevated ingestion latency for Test Engine

Detected by Pingoru
Mar 07, 2026, 12:21 AM UTC
Resolved
Mar 07, 2026, 01:05 AM UTC
Duration
44m
Affected: Ingestion
Timeline · 3 updates
  1. investigating Mar 07, 2026, 12:21 AM UTC

    We are investigating the elevated latency issue for Test Engine. Processing the backlog of test executions is taking longer than expected, so elevated ingestion latency remains.

  2. monitoring Mar 07, 2026, 12:56 AM UTC

    We've identified the issue and the system is currently processing the backlog of test executions

  3. resolved Mar 07, 2026, 01:05 AM UTC

    Processing of test execution ingestion data has successfully caught up.

Read the full incident report →

Minor March 6, 2026

Hosted Agents: Job start latency for a small subset of customers

Detected by Pingoru
Mar 06, 2026, 08:54 AM UTC
Resolved
Mar 06, 2026, 04:30 AM UTC
Duration
Timeline · 1 update
  1. resolved Mar 06, 2026, 08:54 AM UTC

    Buildkite Hosted Agents experienced degraded start-time performance due to a network partition issue in the Hosted Agents control plane. A small subset of customers may have seen delayed job starts during 04:40-04:50 UTC and 05:06-05:16 UTC. The issue has been resolved and we are monitoring to confirm stability.

Read the full incident report →

Minor March 5, 2026

Slow artifact uploads

Detected by Pingoru
Mar 05, 2026, 10:14 PM UTC
Resolved
Mar 06, 2026, 10:23 AM UTC
Duration
12h 9m
Affected: Agent API
Timeline · 3 updates
  1. investigating Mar 05, 2026, 10:14 PM UTC

    We're investigating slow artifact uploads. This is isolated to artifacts, dispatch remains unaffected.

  2. monitoring Mar 06, 2026, 08:02 AM UTC

    Latency for artifact uploads has remained at normal levels for some time now, and we now have a mitigation in place for a common source of load going forward. We are continuing to monitor.

  3. resolved Mar 06, 2026, 10:23 AM UTC

    With artifact upload latency continuing to be stable, we are resolving this incident.

Read the full incident report →

Major March 3, 2026

Latency issues

Detected by Pingoru
Mar 03, 2026, 09:51 PM UTC
Resolved
Mar 04, 2026, 05:24 AM UTC
Duration
7h 32m
Affected: Agent APIJob Queue
Timeline · 7 updates
  1. investigating Mar 03, 2026, 09:51 PM UTC

    We're seeing elevated job dispatch latency and Agent API latency across multiple shards. We're investigating.

  2. investigating Mar 03, 2026, 10:41 PM UTC

    We're still experiencing latency issues for agent api and job dispatch. We continue to investigate and identify the root cause.

  3. investigating Mar 03, 2026, 11:21 PM UTC

    We continue to experience high latency on some services. We're continuing to identify root causes.

  4. monitoring Mar 04, 2026, 12:11 AM UTC

    We've made some changes to address the issue and are seeing signs of recovery. We continue to monitor the situation.

  5. monitoring Mar 04, 2026, 01:06 AM UTC

    We've seen a small number of unrelated issues, each affecting a subset of customers. Most impact is resolved, but we are continuing to monitor impact for a small number of remaining customers. We are in touch with those customers directly.

  6. monitoring Mar 04, 2026, 03:29 AM UTC

    We continue to observe high latency on isolated infrastructure serving Agent API endpoints for a subset of customers. We are provisioning additional capacity to address this latency, and have informed impacted customers.

  7. resolved Mar 04, 2026, 05:24 AM UTC

    We have completed the provisioning of additional capacity mentioned in our last update, and error rates and response times have returned to normal. This incident is now resolved.

Read the full incident report →

Minor February 26, 2026

Increased latency for secrets endpoints for some customers

Detected by Pingoru
Feb 26, 2026, 12:43 AM UTC
Resolved
Feb 26, 2026, 02:44 AM UTC
Duration
2h 1m
Affected: Agent API
Timeline · 3 updates
  1. investigating Feb 26, 2026, 12:43 AM UTC

    We're observing increased latency on secrets endpoints for a subset of our customers. We're currently investigating and will provide status updates as they become available.

  2. monitoring Feb 26, 2026, 12:53 AM UTC

    We've increased the compute available to the secrets service, and have seen response times return to normal levels.

  3. resolved Feb 26, 2026, 02:44 AM UTC

    Response times have returned to normal. This incident is now resolved.

Read the full incident report →