Is Buildkite down?

Last checked 3m ago
Current status
Buildkite is up

No incidents right now.

Official status page: https://www.buildkitestatus.com · Polled every 5 minutes · 2 components tracked

Buildkite is operational right now. Last checked 3m ago; the most recent incident resolved 24d ago.

Real-time Buildkite status, recent outages, and incident history — pulled directly from Buildkite's official status page at https://www.buildkitestatus.com every 5 minutes. Pingoru tracks 2 Buildkite services and has captured 21 incidents in the last 90 days (86.94% uptime). Get email, Slack, Discord, or webhook alerts the moment Buildkite reports a new incident — free for 5 monitors, no credit card.

Users who monitor Buildkite also follow these DevOps services: GitHub Confluence Circle CI OpsGenie JFrog Papertrail Airbrake Splunk OnCall Datadog EU RubyGems View all 6,000+ providers
Buildkite uptime 86.94% uptime · past 90 days
Mon Wed Fri
MarAprMayJun
Less More

Recent outages & incidents

Past 90 days
  1. Resolved 59m
    Started May 20, 2026, 04:40 PM UTC · Resolved May 20, 2026, 05:39 PM UTC
    GitHub Commit Status NotificationsEmail NotificationsSlack NotificationsWebhook Notifications
    Timeline · 5 updates
    • investigating · May 20, 2026, 04:40 PM UTC

      We are investigating delays to notifications across all customers

    • identified · May 20, 2026, 05:06 PM UTC

      We have identified the issue and applied mitigations and are monitoring recovery We have determined that only a subset of customers are affected by the notification latency.

    • monitoring · May 20, 2026, 05:26 PM UTC

      We are seeing recovery across affected customers and continue to monitor

    • resolved · May 20, 2026, 05:39 PM UTC

      The incident is resolved

    • postmortem · May 25, 2026, 07:02 AM UTC

      ## Service Impact A subset of our customers experienced elevated latency in our notification delivery, build dispatch and metrics services. ## Incident Summary We are in the process of migrating our underlying compute platform from AWS Fargate to AWS EKS for our production workloads. We are migrating our services in small batches so we can verify stability as we go. Between 15:42 and 17:33 our EKS Prometheus server began to need more memory than was available on the host where it was running. This was caused by autoscaling operations that increased the number of pods tracked by Prometheus, which in turn increased the Prometheus server's memory requirement. The host killed the Prometheus server process, which was restarted shortly after by the Kubernetes control plane. In the interim, the metrics used for application autoscaling were unavailable. The unavailable metrics meant that the affected services were not being triggered to scale up, resulting in the observed delays. Prometheus exceeded the host's available memory again soon after restarting, which caused the cycle to repeat. The on call team followed a prepared documentation to shift load on the affected services back to Fargate. The majority of customers saw complete recovery from 16:49. A handful of customers had developed such a large backlog during the period of higher latency, that they had to be manually scaled up further. All customers saw full recovery by 17:33. ## Changes we're making We have already made the following changes to our rollout of EKS for production workloads: * Upsized the underlying system nodes. * Set higher requests and limits for the Prometheus server so it can handle more product load. * Reviewed and set any missing requests and limits for all new EKS resources, ensuring that EKS has all the required information to prevent accidental resource contention. * Added more observability and monitors for EKS pod and node health to help us identify root causes quickly during future incidents. We have since migrated all these services back to EKS and observed successful scaling well beyond the limits we encountered during this incident.

    Latest: ## Service Impact A subset of our customers experienced elevated latency in our notification delivery, build dispatch and metrics services. ## Incident Summary We are in the proces…

  2. Resolved 44m
    Started May 15, 2026, 06:51 AM UTC · Resolved May 15, 2026, 07:35 AM UTC
    Ingestion
    Timeline · 2 updates
    • monitoring · May 15, 2026, 06:51 AM UTC

      Ingestion of Test Engine execution data from an internal queue to a data store stalled, has been resumed, and is working through the backlog. Visibility of test executions from the past hour hours will be delayed for approximately a further one hour. This has been a recurring issue; an architectural change is coming soon to eliminate this failure mode.

    • resolved · May 15, 2026, 07:35 AM UTC

      Processing of the backlog is complete.

    Latest: Processing of the backlog is complete.

  3. Resolved 19m
    Started May 13, 2026, 03:14 PM UTC · Resolved May 13, 2026, 03:34 PM UTC
    WebREST APIRemote MCP Server
    Timeline · 2 updates
    • investigating · May 13, 2026, 03:14 PM UTC

      We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

    • resolved · May 13, 2026, 03:34 PM UTC

      Additional capacity was added to our redis caches. This triggered a failover between UTC 15:10 - 15:14 and there was a spike of errors on the REST and GraphQL APIs. Customers would have seen some errors in the Buildkite UI during this period as well. We have been monitoring the situation since then and things have returned to baseline.

    Latest: Additional capacity was added to our redis caches. This triggered a failover between UTC 15:10 - 15:14 and there was a spike of errors on the REST and GraphQL APIs. Customers would…

  4. Resolved 3h 9m
    Started May 12, 2026, 12:59 PM UTC · Resolved May 12, 2026, 04:09 PM UTC
    Ingestion
    Timeline · 2 updates
    • monitoring · May 12, 2026, 12:59 PM UTC

      We are currently experiencing delayed processing of Test Engine data. We have identified and applied a fix for the issue but are expecting to continue to experience delays while we clear the ingestion backlog

    • resolved · May 12, 2026, 04:09 PM UTC

      The fix was successful and the backlog has now been cleared.

    Latest: The fix was successful and the backlog has now been cleared.

  5. Resolved 2h 32m
    Started May 08, 2026, 09:10 PM UTC · Resolved May 08, 2026, 11:42 PM UTC
    Ingestion
    Timeline · 2 updates
    • investigating · May 08, 2026, 09:10 PM UTC

      We are currently experiencing delayed processing of Test Engine data. We have identified and applied a fix for the issue but are expecting to continue to experience delays while we clear the ingestion backlog. At the current processing rate we expect the backlog to be cleared by approximately Sat 09 May 2026 00:00 UTC

    • resolved · May 08, 2026, 11:42 PM UTC

      The delayed backlog is now cleared and Test Engine ingestion is operating normally.

    Latest: The delayed backlog is now cleared and Test Engine ingestion is operating normally.

See the full Buildkite outage history

16 more incidents in the last 90 days, plus the full multi-year archive of per-service events and update timelines.

Browse Buildkite outage history →

Or sign up free to get alerts when Buildkite breaks · 10 free monitors · No credit card

Outage history

Past 90 days · 21 incidents View full outage history →