Buildkite experienced a major incident on August 27, 2025 affecting Agent API and Job Queue, lasting 38m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Aug 27, 2025, 04:45 AM UTC
We've seen an increased error rate in triggering builds, and are deploying a fix. ETA 20 mins.
- monitoring Aug 27, 2025, 05:07 AM UTC
We've remediated the issue and are monitoring. Job creation and trigger builds are looking healthy.
- resolved Aug 27, 2025, 05:23 AM UTC
We've confirmed that the error rates are back to normal since 05:04 UTC.
- postmortem Aug 29, 2025, 01:08 AM UTC
# Service Impact On 2025-08-27 from 04:28 UTC to 04:59 UTC, all customers were unable to create new builds. During this period, only the "create build" functionality was affected. Running builds continued and completed without disruption. All other Buildkite features remained fully operational. # Incident Summary During a routine database schema migration, a required manual step to roll out the changes wasn't executed in the correct sequence. This caused new builds to fail due to a missing database field. As a result, we observed a spike in 5XX responses from our application, and the number of created jobs dropped dramatically. Our engineers quickly identified the issue and took immediate action by manually running the migration to create the missing field and restarting the application servers - rolling forward was found to be the faster resolution. The application recovered rapidly, allowing new builds to be created again. Most of the builds that were attempted during the outage were successfully recovered after the application restart. # Changes we're making We are implementing enhanced guardrails within our schema migration process to automate the required sequence of operations and prevent such process failures in the future.