Buildkite Outage History

Buildkite is up right now

Buildkite had 45 outages in the last 2 years totaling 57h 49m of downtime — averaging 1.8 incidents per month.

There were 45 Buildkite outages since June 30, 2025 totaling 57h 49m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://www.buildkitestatus.com

Minor February 3, 2026

Delays with GitHub Webhooks

Detected by Pingoru
Feb 03, 2026, 03:36 PM UTC
Resolved
Feb 03, 2026, 06:44 PM UTC
Duration
3h 7m
Affected: GitHub Commit Status NotificationsJob QueueHosted Agents
Timeline · 5 updates
  1. investigating Feb 03, 2026, 03:36 PM UTC

    We are currently investigating delays with receiving Webhooks from GitHub.

  2. investigating Feb 03, 2026, 04:20 PM UTC

    We are continuing to investigate this issue.

  3. monitoring Feb 03, 2026, 05:05 PM UTC

    We're continuing to see delays in webhook delivery from GitHub, which may result in delays in Builds triggering and Commit Status updates.

  4. monitoring Feb 03, 2026, 06:15 PM UTC

    We’re seeing improvements with webhook delivery from GitHub, and we continue to monitor the issue.

  5. resolved Feb 03, 2026, 06:44 PM UTC

    This incident has been resolved.

Read the full incident report →

Major January 31, 2026

Test Engine services unavailable

Detected by Pingoru
Jan 31, 2026, 10:06 AM UTC
Resolved
Jan 31, 2026, 11:35 AM UTC
Duration
1h 29m
Affected: WebIngestionREST API
Timeline · 4 updates
  1. investigating Jan 31, 2026, 10:06 AM UTC

    A third party database provider is experiencing an outage which is resulting in Test Engine services being unavailable. We will report updates as we receive them.

  2. investigating Jan 31, 2026, 10:54 AM UTC

    We have been notified by the third party provider that they are currently restoring services. We will update when we have more information.

  3. monitoring Jan 31, 2026, 11:15 AM UTC

    Our third party database provider has recovered their systems and we are have worked through our backlog of test executions. We are continuing to monitor to ensure our systems are healthy.

  4. resolved Jan 31, 2026, 11:35 AM UTC

    We believe this issue is now resolved and all Test Engine systems are functioning. Ingestion of test data was delayed during this incident but we believe there has been no data loss. Customers using Buildkite Test Engine Client (bktec) to run tests will have experienced slower builds during the incident due to bktec using a sub-optimal test planning strategy while Test Engine APIs were unavailable.

Read the full incident report →

Minor January 6, 2026

Slow Loading of Dashboard Pages

Detected by Pingoru
Jan 06, 2026, 01:07 AM UTC
Resolved
Jan 06, 2026, 02:39 AM UTC
Duration
1h 31m
Affected: Web
Timeline · 4 updates
  1. investigating Jan 06, 2026, 01:07 AM UTC

    We've identified that the clusters index page performing poorly. We're currently investigating the issue, and will provide an update soon.

  2. identified Jan 06, 2026, 01:43 AM UTC

    We've identified this issue. We're deploying a fix.

  3. monitoring Jan 06, 2026, 02:05 AM UTC

    We've deployed a fix and verified that latency has improved. We are continuing to monitor.

  4. resolved Jan 06, 2026, 02:39 AM UTC

    This incident has been resolved.

Read the full incident report →

Minor November 21, 2025

Elevated error rate for viewing logs

Detected by Pingoru
Nov 21, 2025, 04:11 AM UTC
Resolved
Nov 21, 2025, 05:03 AM UTC
Duration
52m
Affected: Web
Timeline · 3 updates
  1. identified Nov 21, 2025, 04:11 AM UTC

    We've spotted a slightly elevated error rate when viewing logs in builds UI.

  2. monitoring Nov 21, 2025, 04:19 AM UTC

    We've resolved the issue, and error rates when viewing job logs in the UI have returned to normal.

  3. resolved Nov 21, 2025, 05:03 AM UTC

    We have confirmed that error rates for log retrieval have returned to nominal levels.

Read the full incident report →

Notice October 20, 2025

Ongoing AWS incident

Detected by Pingoru
Oct 20, 2025, 08:38 AM UTC
Resolved
Oct 20, 2025, 11:52 AM UTC
Duration
3h 13m
Timeline · 4 updates
  1. investigating Oct 20, 2025, 08:38 AM UTC

    We're tracking an ongoing incident with an upstream provider, and are investigating any impact on Buildkite's services.

  2. monitoring Oct 20, 2025, 09:22 AM UTC

    We are continuing to track an incident with AWS' us-east-1 region, though impact to Buildkite's services appears to be minimal.

  3. monitoring Oct 20, 2025, 09:53 AM UTC

    We're seeing signs of recovery from AWS. Build notifications may be delayed due to this incident and the flow on effect to other providers but are now being delivered successfully.

  4. resolved Oct 20, 2025, 11:52 AM UTC

    With the recovery of service within AWS, we are treating this incident as resolved. Customers running their agents in AWS may experience lingering issues as service within the region stabilizes.

Read the full incident report →

Notice October 15, 2025

Increased latency

Detected by Pingoru
Oct 15, 2025, 03:34 AM UTC
Resolved
Oct 15, 2025, 03:50 AM UTC
Duration
16m
Affected: Web
Timeline · 4 updates
  1. investigating Oct 15, 2025, 03:34 AM UTC

    We're observing increased latency and error rates for a subset of our customers. We're currently investigating and will provide status updates as they become available.

  2. identified Oct 15, 2025, 03:38 AM UTC

    We've identified a problematic deploy and performing a rollback.

  3. monitoring Oct 15, 2025, 03:42 AM UTC

    The rollback is finishing and we will continue to monitor the latency and errors.

  4. resolved Oct 15, 2025, 03:50 AM UTC

    The rollback has been completed.

Read the full incident report →

Minor September 30, 2025

Delayed processing of execution uploads

Detected by Pingoru
Sep 30, 2025, 07:44 PM UTC
Resolved
Oct 01, 2025, 06:28 AM UTC
Duration
10h 43m
Affected: Ingestion
Timeline · 4 updates
  1. investigating Sep 30, 2025, 07:44 PM UTC

    We're investigating delayed processing of execution uploads in the ingestion pipeline. You may see partial data appearing for Runs and Tests in Test Engine.

  2. identified Sep 30, 2025, 09:04 PM UTC

    We have identified an issue with processing uploads to Test Engine and are working on a fix.

  3. monitoring Sep 30, 2025, 10:12 PM UTC

    We have determined the root cause and have fixed the issue. The system is now processing a backlog of execution data.

  4. resolved Oct 01, 2025, 06:28 AM UTC

    We think the impact from the issue is over.

Read the full incident report →

Minor September 29, 2025

Increased error rate for pipeline uploads

Detected by Pingoru
Sep 29, 2025, 06:06 AM UTC
Resolved
Sep 29, 2025, 06:45 AM UTC
Duration
38m
Affected: Job Queue
Timeline · 3 updates
  1. investigating Sep 29, 2025, 06:06 AM UTC

    We've been alerted to an increased error rate for pipeline uploads, and are investigating the cause.

  2. monitoring Sep 29, 2025, 06:23 AM UTC

    We have identified a faulty deploy, and have performed a rollback to resolve the issue.

  3. resolved Sep 29, 2025, 06:45 AM UTC

    A handful of customers were unable to upload pipelines containing steps with soft-failure configurations, resulting in failed builds. This bug has since been rolled back. Pipeline uploads are now working again as expected.

Read the full incident report →

Notice September 19, 2025

Increased latency and timeouts on REST and GraphQL API

Detected by Pingoru
Sep 19, 2025, 07:47 PM UTC
Resolved
Sep 19, 2025, 09:17 PM UTC
Duration
1h 29m
Timeline · 4 updates
  1. investigating Sep 19, 2025, 07:47 PM UTC

    We are investigating increased latency and timeouts with the REST and GraphQL endpoints.

  2. identified Sep 19, 2025, 08:30 PM UTC

    We have discovered a rogue puma process contributing to the high latency and have restarted it. We are now investigating other processes to determine if they are exhibiting the same rogue behaviour.

  3. monitoring Sep 19, 2025, 09:07 PM UTC

    We've identified a large increase in load which has now been resolved, and we're monitoring to ensure request latency returns to normal operating levels.

  4. resolved Sep 19, 2025, 09:17 PM UTC

    We are happy that request latency has dropped to normal levels so consider this issue resolved.

Read the full incident report →

Minor August 13, 2025

Increased latency and error rates in Agent API

Detected by Pingoru
Aug 13, 2025, 01:28 AM UTC
Resolved
Aug 13, 2025, 03:01 AM UTC
Duration
1h 33m
Affected: Agent API
Timeline · 3 updates
  1. investigating Aug 13, 2025, 01:28 AM UTC

    We're observing increased latency and error rates in the Agent API for some customers. We're currently investigating and will provide status updates as they become available.

  2. monitoring Aug 13, 2025, 02:30 AM UTC

    We're starting to see recovery in latency and error rate in the affected services. We're continuing to monitor, and will provide more status updates as they become available.

  3. resolved Aug 13, 2025, 03:01 AM UTC

    Error rates and latency in the Agent API have returned to background levels! We're continuing to investigate the root cause of this issue.

Read the full incident report →

Minor August 1, 2025

High dispatch latency for hosted agents

Detected by Pingoru
Aug 01, 2025, 03:37 PM UTC
Resolved
Aug 01, 2025, 04:50 PM UTC
Duration
1h 13m
Affected: Hosted Agents
Timeline · 5 updates
  1. investigating Aug 01, 2025, 03:37 PM UTC

    We're experiencing high job dispatch latency for hosted agents. We're working to identify the root cause.

  2. investigating Aug 01, 2025, 03:48 PM UTC

    System performance has returned to normal. We are still working to identify the root cause.

  3. investigating Aug 01, 2025, 04:33 PM UTC

    Service status remains operational. We are working with our hosting partner to identify the root cause.

  4. identified Aug 01, 2025, 04:50 PM UTC

    Unexpected load in an upstream database caused a period of high latency for hosted agent dispatch.

  5. resolved Aug 01, 2025, 04:50 PM UTC

    This incident is now resolved.

Read the full incident report →

Minor July 30, 2025

Service degredation with flaky test detection

Detected by Pingoru
Jul 30, 2025, 09:29 AM UTC
Resolved
Jul 30, 2025, 11:10 AM UTC
Duration
1h 41m
Affected: WebIngestion
Timeline · 3 updates
  1. investigating Jul 30, 2025, 09:29 AM UTC

    We are observing delays in processing flaky test detection and are currently investigating.

  2. monitoring Jul 30, 2025, 10:41 AM UTC

    We have identified and rectified the cause of this delay. The system is now working through the backlog of flaky test data.

  3. resolved Jul 30, 2025, 11:10 AM UTC

    Flaky test detection has returned to normal operation and processing is no longer delayed.

Read the full incident report →

Major July 24, 2025

Delayed notifications for a subset of customers

Detected by Pingoru
Jul 24, 2025, 02:49 PM UTC
Resolved
Jul 24, 2025, 03:38 PM UTC
Duration
49m
Timeline · 4 updates
  1. investigating Jul 24, 2025, 02:49 PM UTC

    We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

  2. identified Jul 24, 2025, 03:09 PM UTC

    We've identified the cause of the issue and are working on a fix.

  3. monitoring Jul 24, 2025, 03:32 PM UTC

    We've deployed a fix and are starting to see recovery. We're continuing to monitor.

  4. resolved Jul 24, 2025, 03:38 PM UTC

    We've confirmed that our fix addressed the underlying cause of the delayed notifications. Only a subset of customers would have seen an impact from this incident.

Read the full incident report →

Minor June 30, 2025

Error spike on Test Engine pages API endpoints

Detected by Pingoru
Jun 30, 2025, 11:09 PM UTC
Resolved
Jul 01, 2025, 12:03 AM UTC
Duration
53m
Affected: WebREST API
Timeline · 3 updates
  1. investigating Jun 30, 2025, 11:09 PM UTC

    We've spotted an error spike on Test Engine pages and API endpoints.

  2. monitoring Jun 30, 2025, 11:35 PM UTC

    The issue has been resolved, and we are monitoring to ensure everything is functioning properly.

  3. resolved Jul 01, 2025, 12:03 AM UTC

    We’ve been monitoring this issue, and everything is stable and working properly.

Read the full incident report →

Minor June 30, 2025

Delay in processing Test Engine results from S3

Detected by Pingoru
Jun 30, 2025, 07:14 PM UTC
Resolved
Jul 01, 2025, 01:28 AM UTC
Duration
6h 14m
Affected: Ingestion
Timeline · 4 updates
  1. investigating Jun 30, 2025, 07:14 PM UTC

    We are investigating an issue with Test Engine results uploaded via S3 not being processed. We will provide a further update in 30min

  2. monitoring Jun 30, 2025, 07:30 PM UTC

    We have identified an issue with processing test data uploaded via S3 and are now working through the backlog.

  3. monitoring Jun 30, 2025, 11:42 PM UTC

    We have resolved the issue and are working through the backlog of data. Some customers may see incomplete information in Test Engine, but rest assured all data will be fully processed and no data has been lost.

  4. resolved Jul 01, 2025, 01:28 AM UTC

    This incident has been resolved.

Read the full incident report →