Buildkite Outage History

Buildkite is up right now

There were 19 Buildkite outages since February 3, 2026 totaling 54h 44m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://www.buildkitestatus.com

Major May 1, 2026

Buildkite service disruption

Detected by Pingoru
May 01, 2026, 02:41 AM UTC
Resolved
May 01, 2026, 03:30 AM UTC
Duration
49m
Timeline · 3 updates
  1. identified May 01, 2026, 02:41 AM UTC

    We've identified a service change that is causing a service disruption. We are reverting this change.

  2. monitoring May 01, 2026, 02:52 AM UTC

    We've corrected the issue that caused this disruption and normal service has been restored. We are monitoring the situation now.

  3. resolved May 01, 2026, 03:30 AM UTC

    This incident has been resolved.

Read the full incident report →

Minor April 29, 2026

Increased latency and error rates

Detected by Pingoru
Apr 29, 2026, 05:43 PM UTC
Resolved
Apr 29, 2026, 06:48 PM UTC
Duration
1h 4m
Affected: Agent API
Timeline · 3 updates
  1. investigating Apr 29, 2026, 05:43 PM UTC

    We're observing increased latency and error rates for a subset of our customers. We're currently investigating and will provide status updates as they become available.

  2. monitoring Apr 29, 2026, 06:18 PM UTC

    We have identified and fixed the issue with the underlying database for a subset of customers. We are now monitoring the issue.

  3. resolved Apr 29, 2026, 06:48 PM UTC

    We have confirmed that latency and error rates have returned to normal for impacted customers.

Read the full incident report →

Minor April 28, 2026

Increased dispatch latency and error rates

Detected by Pingoru
Apr 28, 2026, 06:00 PM UTC
Resolved
Apr 28, 2026, 07:16 PM UTC
Duration
1h 15m
Affected: Hosted Agents
Timeline · 4 updates
  1. investigating Apr 28, 2026, 06:00 PM UTC

    We're observing increased error rates and dispatch latency for a subset of our customers. We're currently investigating and will provide status updates as they become available.

  2. identified Apr 28, 2026, 06:26 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Apr 28, 2026, 06:45 PM UTC

    We have mitigated the issue causing increased Hosted Agents dispatch latency and intermittent timeout errors for a subset of customers. We identified abnormal workload activity that was placing elevated load on a supporting service, and have now blocked that activity and applied additional protections. Service metrics have returned to normal, and we are continuing to monitor closely.

  4. resolved Apr 28, 2026, 07:16 PM UTC

    Previously elevated loads with Hosted Agents dispatch have fully recovered.

Read the full incident report →

Minor April 22, 2026

Auth failures with remote MCP server

Detected by Pingoru
Apr 22, 2026, 09:19 PM UTC
Resolved
Apr 22, 2026, 10:59 PM UTC
Duration
1h 40m
Timeline · 4 updates
  1. investigating Apr 22, 2026, 09:19 PM UTC

    We are currently investigating reports of authentication failures with the remote MCP server.

  2. investigating Apr 22, 2026, 10:07 PM UTC

    We are continuing to investigate errors when authenticating to the remote MCP server.

  3. monitoring Apr 22, 2026, 10:44 PM UTC

    We have rolled back a change on the remote MCP server that was contributing to authentication failures.

  4. resolved Apr 22, 2026, 10:59 PM UTC

    The issue is resolved.

Read the full incident report →

Minor April 22, 2026

Delayed processing of test execution

Detected by Pingoru
Apr 22, 2026, 02:32 AM UTC
Resolved
Apr 22, 2026, 05:07 AM UTC
Duration
2h 35m
Affected: Ingestion
Timeline · 2 updates
  1. monitoring Apr 22, 2026, 02:32 AM UTC

    We noticed a lag in data processing, but our systems are operational and currently working through the backlog. We expect to be fully caught up within the next couple of hours.

  2. resolved Apr 22, 2026, 05:07 AM UTC

    The backlog has been cleared and all systems are fully operational. Thank you for your patience.

Read the full incident report →

Major April 8, 2026

Degraded performance and increased error rates

Detected by Pingoru
Apr 08, 2026, 10:26 PM UTC
Resolved
Apr 08, 2026, 11:12 PM UTC
Duration
45m
Affected: GitHub Commit Status NotificationsEmail NotificationsAgent APISlack NotificationsWebhook Notifications
Timeline · 4 updates
  1. investigating Apr 08, 2026, 10:26 PM UTC

    We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

  2. monitoring Apr 08, 2026, 10:40 PM UTC

    We have identified and fixed the issue. We are monitoring and seeing signs of improvement.

  3. resolved Apr 08, 2026, 11:12 PM UTC

    We experienced an issue that caused a brief increase in errors for the Agent API. This also impacted latency for notifications. All notifications were stored in a queue and processed. Latency is now back to normal.

  4. postmortem Apr 21, 2026, 06:04 AM UTC

    ## Service Impact On April 8th from 22:10 to 22:15 UTC and from 22:33 to 22:38 UTC all customers would have experienced an increase in latency browsing [buildkite.com](http://buildkite.com) and using Buildkite REST and GraphQL APIs, as well as latency creating triggered and scheduled builds of up to 12 minutes. A portion of customers also experienced increased latency and error rates for the Agent API, with the impact not being evenly distributed, causing some customers to experience a higher impact than others. Customers impacted experienced p99 latency of more than 1 second and error rates of 0.5%. Between 22:10 to 23:03 UTC a majority of customers experienced notification latency of between 5 and 33 minutes, which includes Github commit statuses and webhook delivery, as well as delays in processing incoming webhooks of up to 40 seconds. ## Incident Summary Our engineers noticed an increase in exceptions and shortly afterwards received an alert at 22:16 UTC for high CPU utilisation on a single node in a Redis cluster. Upon investigation we found a rate limiter was hot spotting on a single node within the cluster. Upon removing this limit at 22:33 UTC the impact of high Redis CPU utilization was mitigated, with CPU utilization falling to 5% for the affected node where it had been at 50% prior to the incident. Further investigations also revealed high load on a replica database was also contributing to high latency, which recovered at 22:37 UTC after a third replica was added. The rate limit which was responsible for high load on Redis had caused hot spotting on a single key because it was applied across all organizations. This limit was introduced in response to a previous incident but hadn’t been required since our work to horizontally shard our Pipelines database had distributed the load and enabled higher scalability in early 2024. ## Changes we've made These are the changes we’ve made in response to this incident: * Removed the global rate limit that was contributing to a significant proportion of load * Increased the number of read replicas for our customer information database to 3 and increased the instance size * We’ve reviewed our remaining rate limits and have confirmed that no other rate limits apply globally to all shards.

Read the full incident report →

Minor March 31, 2026

Hosted Agents jobs immediately cancelled

Detected by Pingoru
Mar 31, 2026, 07:51 AM UTC
Resolved
Mar 31, 2026, 08:34 AM UTC
Duration
43m
Affected: Hosted Agents
Timeline · 3 updates
  1. investigating Mar 31, 2026, 07:51 AM UTC

    We have received reports from customers that they are unable to start builds on Hosted Agents. Their builds are immediately cancelled. We are investigating.

  2. identified Mar 31, 2026, 08:15 AM UTC

    We have identified the issue and are rolling out a fix.

  3. resolved Mar 31, 2026, 08:34 AM UTC

    We have deployed the fix and we have confirmed customer builds are working. If you encounter any further issues please contact support.

Read the full incident report →

Minor March 27, 2026

504 errors viewing builds

Detected by Pingoru
Mar 27, 2026, 07:02 AM UTC
Resolved
Mar 27, 2026, 08:53 AM UTC
Duration
1h 51m
Affected: Web
Timeline · 4 updates
  1. investigating Mar 27, 2026, 07:02 AM UTC

    We're seeing an increase in 504 errors when viewing pipeline builds. We're investigating this now.

  2. identified Mar 27, 2026, 07:18 AM UTC

    We've identified a change which we think is the cause of this issue, and we're in the process of reverting it.

  3. monitoring Mar 27, 2026, 08:08 AM UTC

    The deploy to revert this change is complete and builds are loading normally. We will continue to monitor for any other issues.

  4. resolved Mar 27, 2026, 08:53 AM UTC

    The incident is now resolved. We are no longer seeing errors when viewing pipelines.

Read the full incident report →

Minor March 25, 2026

Increased Delays with Hosted Agents

Detected by Pingoru
Mar 25, 2026, 02:26 PM UTC
Resolved
Mar 25, 2026, 03:50 PM UTC
Duration
1h 24m
Affected: MacOSLinux (ARM64)Linux (AMD64)Hosted Agents
Timeline · 4 updates
  1. investigating Mar 25, 2026, 02:26 PM UTC

    We are currently investigating this issue.

  2. identified Mar 25, 2026, 02:30 PM UTC

    The issue has been identified to be related to Networking and affecting Git Mirror cloning.

  3. monitoring Mar 25, 2026, 02:59 PM UTC

    The networking issue has been resolved, dispatch of Hosted Agents has returned to normal levels and no further issues with Git cloning. We are monitoring the situation.

  4. resolved Mar 25, 2026, 03:50 PM UTC

    This incident is now resolved. We are no longer seeing further networking issues with Hosted Agents, which affected delays in creating them for Jobs, resolving external traffic and interactions with Cache - affecting Git Mirror Cloning.

Read the full incident report →

Major March 11, 2026

Increased queue times on hosted agents

Detected by Pingoru
Mar 11, 2026, 07:50 PM UTC
Resolved
Mar 11, 2026, 09:14 PM UTC
Duration
1h 24m
Affected: Hosted Agents
Timeline · 3 updates
  1. investigating Mar 11, 2026, 07:50 PM UTC

    We are investigating reports of elevated queue times with hosted agents.

  2. monitoring Mar 11, 2026, 08:44 PM UTC

    We identified increased demand affecting hosted agent queue times. We have added additional capacity and are seeing recovery of hosted agent queue times.

  3. resolved Mar 11, 2026, 09:14 PM UTC

    This incident has been resolved.

Read the full incident report →

Major March 10, 2026

Increased error rates from Test Plan API

Detected by Pingoru
Mar 10, 2026, 01:21 AM UTC
Resolved
Mar 10, 2026, 09:34 AM UTC
Duration
8h 13m
Affected: REST API
Timeline · 3 updates
  1. investigating Mar 10, 2026, 01:21 AM UTC

    We've observed periodic test splitting plan timing out and falling back to non-intelligent splitting. Performance appears to be back to normal as of an hour ago. We are continuing to investigate the root cause and solve the underlying issue.

  2. monitoring Mar 10, 2026, 02:25 AM UTC

    We have implemented several mitigation and continue working on fixing the underlying cause. Our team is actively monitoring the situation to ensure the stability. We will provide further updates as we make progress on resolving this issue.

  3. resolved Mar 10, 2026, 09:34 AM UTC

    Our mitigations have resolved the elevated latency and likelihood of suboptimal fallback test plans. We have also identified and fixed a blind-spot in our automated alerting, which was previously unable to detect this scenario as an issue. Work continues this week to resolve the underlying performance issue by restructuring how the relevant data is ingested and accessed.

Read the full incident report →

Minor March 7, 2026

Elevated ingestion latency for Test Engine

Detected by Pingoru
Mar 07, 2026, 12:21 AM UTC
Resolved
Mar 07, 2026, 01:05 AM UTC
Duration
44m
Affected: Ingestion
Timeline · 3 updates
  1. investigating Mar 07, 2026, 12:21 AM UTC

    We are investigating the elevated latency issue for Test Engine. Processing the backlog of test executions is taking longer than expected, so elevated ingestion latency remains.

  2. monitoring Mar 07, 2026, 12:56 AM UTC

    We've identified the issue and the system is currently processing the backlog of test executions

  3. resolved Mar 07, 2026, 01:05 AM UTC

    Processing of test execution ingestion data has successfully caught up.

Read the full incident report →

Minor March 6, 2026

Hosted Agents: Job start latency for a small subset of customers

Detected by Pingoru
Mar 06, 2026, 08:54 AM UTC
Resolved
Mar 06, 2026, 04:30 AM UTC
Duration
Timeline · 1 update
  1. resolved Mar 06, 2026, 08:54 AM UTC

    Buildkite Hosted Agents experienced degraded start-time performance due to a network partition issue in the Hosted Agents control plane. A small subset of customers may have seen delayed job starts during 04:40-04:50 UTC and 05:06-05:16 UTC. The issue has been resolved and we are monitoring to confirm stability.

Read the full incident report →

Minor March 5, 2026

Slow artifact uploads

Detected by Pingoru
Mar 05, 2026, 10:14 PM UTC
Resolved
Mar 06, 2026, 10:23 AM UTC
Duration
12h 9m
Affected: Agent API
Timeline · 3 updates
  1. investigating Mar 05, 2026, 10:14 PM UTC

    We're investigating slow artifact uploads. This is isolated to artifacts, dispatch remains unaffected.

  2. monitoring Mar 06, 2026, 08:02 AM UTC

    Latency for artifact uploads has remained at normal levels for some time now, and we now have a mitigation in place for a common source of load going forward. We are continuing to monitor.

  3. resolved Mar 06, 2026, 10:23 AM UTC

    With artifact upload latency continuing to be stable, we are resolving this incident.

Read the full incident report →

Major March 3, 2026

Latency issues

Detected by Pingoru
Mar 03, 2026, 09:51 PM UTC
Resolved
Mar 04, 2026, 05:24 AM UTC
Duration
7h 32m
Affected: Agent APIJob Queue
Timeline · 7 updates
  1. investigating Mar 03, 2026, 09:51 PM UTC

    We're seeing elevated job dispatch latency and Agent API latency across multiple shards. We're investigating.

  2. investigating Mar 03, 2026, 10:41 PM UTC

    We're still experiencing latency issues for agent api and job dispatch. We continue to investigate and identify the root cause.

  3. investigating Mar 03, 2026, 11:21 PM UTC

    We continue to experience high latency on some services. We're continuing to identify root causes.

  4. monitoring Mar 04, 2026, 12:11 AM UTC

    We've made some changes to address the issue and are seeing signs of recovery. We continue to monitor the situation.

  5. monitoring Mar 04, 2026, 01:06 AM UTC

    We've seen a small number of unrelated issues, each affecting a subset of customers. Most impact is resolved, but we are continuing to monitor impact for a small number of remaining customers. We are in touch with those customers directly.

  6. monitoring Mar 04, 2026, 03:29 AM UTC

    We continue to observe high latency on isolated infrastructure serving Agent API endpoints for a subset of customers. We are provisioning additional capacity to address this latency, and have informed impacted customers.

  7. resolved Mar 04, 2026, 05:24 AM UTC

    We have completed the provisioning of additional capacity mentioned in our last update, and error rates and response times have returned to normal. This incident is now resolved.

Read the full incident report →

Major February 26, 2026

Increased dispatch latency

Detected by Pingoru
Feb 26, 2026, 07:10 PM UTC
Resolved
Feb 27, 2026, 02:30 AM UTC
Duration
7h 20m
Affected: Job Queue
Timeline · 6 updates
  1. identified Feb 26, 2026, 07:10 PM UTC

    Some customers are experiencing increased latency for jobs being assigned to agents. We have identified the cause and are working on mitigations.

  2. monitoring Feb 26, 2026, 07:36 PM UTC

    We're seeing signs of recovery and will continue to monitor.

  3. investigating Feb 26, 2026, 11:41 PM UTC

    We're seeing ongoing latency impact across for a subset of customers. Some customers are seeing signs of improvement, but we are continuing to investigate the issue.

  4. monitoring Feb 27, 2026, 12:57 AM UTC

    We've seen recovery for the remaining subset of customers. We will continue to monitor.

  5. resolved Feb 27, 2026, 02:30 AM UTC

    We have seen a full recovery of service, and have a good understanding of the underlying cause. We will publish a post-incident review next week.

  6. postmortem Mar 04, 2026, 02:33 AM UTC

    # Service Impact Between approximately 18:00 UTC and 22:50 UTC February 26, 2026, a subset of customers experienced increased latency when dispatching jobs to agents. Affected customers observed agents sitting idle for several minutes despite having matching jobs waiting in the queue. Job dispatch eventually succeeded, but with significantly elevated latency. The impact was concentrated on specific database shards but affected customers across multiple shards over the course of the incident. # Incident Summary A database maintenance task designed to improve job ordering performance was running across all production database shards. However, this task was itself contributing significant database load, which impacted normal job dispatch and pipeline upload operations. This increased database load caused dispatch operations to queue up, resulting in the observed delays in matching jobs to agents. The issue was compounded by a connection pooling service having several containers running on underperforming infrastructure, which reduced the available database throughput. Contributing factors: * The maintenance task consumed limited database resources, which conflicted with concurrent dispatch operations * The task was running simultaneously across all database shards, amplifying the impact * A connection pooling service had degraded capacity due to infrastructure imbalance # Changes we're making * The maintenance task has been paused and in future will be run during low-traffic periods and on individual shards rather than all shards simultaneously * The connection pooling service has been rebalanced to ensure consistent performance * We are improving our monitoring and dashboards to enable faster identification of lock contention issues during incidents

Read the full incident report →

Minor February 26, 2026

Increased latency for secrets endpoints for some customers

Detected by Pingoru
Feb 26, 2026, 12:43 AM UTC
Resolved
Feb 26, 2026, 02:44 AM UTC
Duration
2h 1m
Affected: Agent API
Timeline · 3 updates
  1. investigating Feb 26, 2026, 12:43 AM UTC

    We're observing increased latency on secrets endpoints for a subset of our customers. We're currently investigating and will provide status updates as they become available.

  2. monitoring Feb 26, 2026, 12:53 AM UTC

    We've increased the compute available to the secrets service, and have seen response times return to normal levels.

  3. resolved Feb 26, 2026, 02:44 AM UTC

    Response times have returned to normal. This incident is now resolved.

Read the full incident report →

Minor February 13, 2026

Some jobs not dispatching

Detected by Pingoru
Feb 13, 2026, 02:59 AM UTC
Resolved
Feb 12, 2026, 01:33 AM UTC
Duration
Timeline · 2 updates
  1. resolved Feb 13, 2026, 02:59 AM UTC

    Between 2026-02-12 01:33 and 04:49 UTC, an elevated error rate in job dispatch was observed. This affected a subset of jobs. Affected jobs were unable to be dispatched during the disruption window, but were eventually dispatched due to automatic retries. A subset of pipeline step uploads were also affected; the agent would check for successful step upload over a 5-minute period before timing out. This occurred due to a bug in a configuration change that resulted in some Sidekiq instances not receiving all required database connection configuration. The configuration change was subsequently reverted, with further investigation underway.

  2. postmortem Feb 19, 2026, 02:03 AM UTC

    ## Service Impact Between 01:33 UTC and 04:49 UTC on February 12, 2026, a subset of Sidekiq background workers experienced database connection failures, affecting several types of operations over approximately **3 hours and 16 minutes**. For builds where pipeline steps had already been uploaded, the impact was limited to delays. Job dispatching, notifications, and job completion processing were delayed but retried successfully. **All jobs that had already been uploaded were eventually dispatched**, and no jobs were lost. Across all shards, approximately 0.35% of job dispatches and 0.5% of notifications needed retries. However, the impact was not evenly distributed. The underlying issue affected some worker containers and not others, and because containers are assigned to process work for specific database shards, some shards experienced significantly higher error rates while others were completely unaffected. Because each customer's data is assigned to a specific shard, customers on the worst-affected shards would have seen a much higher failure rate than these fleet-wide figures suggest. For builds where a pipeline upload was still in progress, the upload itself could have failed entirely, resulting in a failed build. In these cases, customers would have needed to retry the build manually. ## Incident Summary We were in the process of migrating how environment variables are provided to our Sidekiq background workers running on AWS Fargate. The migration involved two changes: an application-level change to support loading environment configuration from S3, and an infrastructure change to have the Sidekiq services rely on that new loading mechanism. The application change was deployed on February 10 without issue. At this point we verified across all Sidekiq services that the environment is correct when loading from S3. This was done by invoking the S3 environment loading mechanism on a single container per Sidekiq service and comparing the result with existing Sidekiq containers. On February 12 at approximately 01:30 UTC, the infrastructure change was applied, and a subsequent application deployment caused new worker containers to launch using the S3 configuration. Under the new configuration, environment files are downloaded from S3 at boot time and written to a local path. However, some of our Sidekiq services use `sidekiqswarm`, which forks multiple child processes. Gem and application preloading is disabled for these services, meaning each child process independently boots the full Rails application \(including the S3 download step\) after the fork. When multiple child processes booted simultaneously, they each attempted to download the same S3 files and write them to the same local path concurrently. This introduced a race condition: one process could read a file right when another process zeroed it out before writing to it, resulting in a process missing environment variables. This meant that database URL environment variables, which tell the application how to connect to PostgreSQL over the network, were absent in some workers. Without it, the PostgreSQL client fell back to attempting a local Unix socket connection, which does not exist in the containerized environment. Sidekiq workers are responsible for a range of asynchronous operations including job dispatching, notifications, and job completion processing. Workers that could not connect to the database would fail to perform any of these operations and would automatically retry, either succeeding when the retry was performed on an unaffected worker, or after the configuration was rolled back. However, pipeline upload operations that timed out during the incident window could fail outright, causing the associated build to fail. The issue was not immediately detected for two reasons. First, the race condition meant it was effectively random whether a given container was affected. Approximately 0.35% of job dispatches and 0.5% of notifications needed retries across the incident window. Second, the failures did not trigger our existing alerts. When a worker hit a database connection failure, an internal error handling layer caught the error and silently rescheduled the job for retry instead of raising an exception. Our metrics layer saw no raised exception from these attempts, so they were indistinguishable from successful executions. An existing monitor that tracks P99 job dispatch duration saw no increase, because the failures happened almost instantly and looked identical to fast successes. The monitor is effective at detecting slow job processing, but this failure mode \(fast, silent, and rescheduled\) fell outside its coverage. A high volume of database connection errors was present in application logs, which is ultimately how responders identified the root cause during investigation. These factors — a low overall error rate, silent rescheduling, and a duration-based monitor that couldn't detect fast failures — contributed to the delay in detection. **Contributing factors:** * A race condition in the multi-process Sidekiq setup caused concurrent S3 environment file downloads to write to the same local path, allowing processes to read zero-byte files and boot without critical environment variables. * There was no startup assertion to fail fast if critical environment variables such as database configuration were missing or empty after the environment loading step. * Database connection errors in affected workers were caught by an internal error handling layer and silently rescheduled rather than raised as exceptions, preventing them from appearing in error metrics or alerting. * Existing monitoring was designed to detect slow job processing rather than fast failures, so this failure mode fell outside its coverage. **Key timestamps \(UTC\):** | Time | Event | | --- | --- | | 01:33 | Impact began — first database connection errors appeared | | 03:57 | Incident declared; investigation began | | 04:19 | Root cause identified; revert of infrastructure change initiated | | 04:37 | Revert deployed via application deployment | | 04:49 | Impact ended — error rates returned to zero | * **Total duration of impact: 3 hours and 16 minutes.** * **Time to detection: 2 hours and 24 minutes.** * **Time from detection to resolution: 52 minutes.** ## Changes we're making * **Startup validation for critical environment variables \(deployed\):** We’ve added a fail-fast assertion during application boot to verify that database URLs and other critical configuration values are present and non-empty after environment loading completes. * **Atomic S3 file downloads \(deployed\):** We’ve updated the S3 environment file download process to avoid the possibility of processes reading incomplete environment files. * **Improved job processing monitoring \(in progress\):** We are adding error-rate monitoring for job processing to complement our existing latency-based monitors, ensuring that workers which are picking up jobs but failing to complete them are detected regardless of how quickly the failure occurs. * **Incremental rollout for worker configuration changes \(in progress\)**: Future changes to how workers load critical configuration will be deployed to a subset of services first, allowing us to detect problems before they affect the full fleet.

Read the full incident report →

Minor February 3, 2026

Delays with GitHub Webhooks

Detected by Pingoru
Feb 03, 2026, 03:36 PM UTC
Resolved
Feb 03, 2026, 06:44 PM UTC
Duration
3h 7m
Affected: GitHub Commit Status NotificationsJob QueueHosted Agents
Timeline · 5 updates
  1. investigating Feb 03, 2026, 03:36 PM UTC

    We are currently investigating delays with receiving Webhooks from GitHub.

  2. investigating Feb 03, 2026, 04:20 PM UTC

    We are continuing to investigate this issue.

  3. monitoring Feb 03, 2026, 05:05 PM UTC

    We're continuing to see delays in webhook delivery from GitHub, which may result in delays in Builds triggering and Commit Status updates.

  4. monitoring Feb 03, 2026, 06:15 PM UTC

    We’re seeing improvements with webhook delivery from GitHub, and we continue to monitor the issue.

  5. resolved Feb 03, 2026, 06:44 PM UTC

    This incident has been resolved.

Read the full incident report →

Looking to track Buildkite downtime and outages?

Pingoru polls Buildkite's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when Buildkite reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track Buildkite alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring Buildkite for free

5 free monitors · No credit card required