Buildkite incident
AWS us-east-1 single availability zone outage
Buildkite experienced a minor incident on May 8, 2026, lasting 1d 3h. The incident has been resolved; the full update timeline is below.
Update timeline
- investigating May 08, 2026, 01:12 AM UTC
We're aware that AWS is reporting availability zone failures in us-east-1. We are monitoring the situation but so far there is no customer impact.
- investigating May 08, 2026, 01:50 AM UTC
A small subset of our customers are experiencing delayed notifications. We are actively provisioning additional capacity for these customers. Availability zone automatic failovers are occurring in response to the outage, and this is causing some brief error blips for some customers.
- investigating May 08, 2026, 02:35 AM UTC
We are continuing to actively monitor the impacts of this availability zone outage for Buildkite customers. Some transient errors are visible due to availability zone failover events.
- investigating May 08, 2026, 04:08 AM UTC
We have provisioned additional capacity in unaffected availability zones so that they are able to support the additional load. Automatic failovers continue to occur where necessary. Some latency and transient errors will be visible to customers.
- investigating May 08, 2026, 05:10 AM UTC
We are actively moving resources out of us-east-1c. Similar brief latency and error blips will be visible to customers while these manual failovers occur.
- investigating May 08, 2026, 05:45 AM UTC
We are continuing to move infrastructure resources out of the affected AWS Availability Zone. Brief latency and error blips will unfortunately continue while these manual failovers occur.
- investigating May 08, 2026, 07:04 AM UTC
We are continuing to move infrastructure resources out of the affected AWS Availability Zone. Brief latency and error blips may continue while these manual failovers occur.
- investigating May 08, 2026, 07:22 AM UTC
We are continuing to move infrastructure resources out of the affected AWS Availability Zone. Brief latency and error blips may continue while these manual failovers occur. (Apologies if you receive duplicated notifications for this update.)
- monitoring May 08, 2026, 08:07 AM UTC
Despite the ongoing AWS incident, our own services are now stable. We are continuing to monitor our services closely, and are ready for further action should the need arise. We are also watching AWS services closely as they recover.
- resolved May 09, 2026, 04:34 AM UTC
The upstream AWS incident in us-east-1 has been resolved by AWS, and all Buildkite services are operating normally. No further customer impact is expected. We appreciate your patience during this incident.
- postmortem May 13, 2026, 05:07 AM UTC
## Service impact On 8th May 2026 UTC, between 00:00 and 07:30 UTC, some customers would have seen intermittent errors and latency spikes across many areas of the platform. ## Incident Summary The AWS availability zone incident in `use1-az4` triggered our automatic availability failover mechanisms on database and cache clusters, as per AZ-failure tolerant design. During the failover we saw some isolated request errors that were handled by client-side retries in the agent. Customer workloads were either entirely undisrupted or in the worst case saw elevated latency for a period of up to 5 minutes. Throughout the incident we monitored customer impact and prepared additional resources in a healthy availability zone to manually failover to if the automated systems proved insufficient. These were not necessary, and all our infrastructure self healed.