Buildkite incident

Increased error rate when triggering builds

Major Resolved View vendor source →

Buildkite experienced a major incident on August 27, 2025 affecting Agent API and Job Queue, lasting 38m. The incident has been resolved; the full update timeline is below.

Started
Aug 27, 2025, 04:45 AM UTC
Resolved
Aug 27, 2025, 05:23 AM UTC
Duration
38m
Detected by Pingoru
Aug 27, 2025, 04:45 AM UTC

Affected components

Agent APIJob Queue

Update timeline

  1. identified Aug 27, 2025, 04:45 AM UTC

    We've seen an increased error rate in triggering builds, and are deploying a fix. ETA 20 mins.

  2. monitoring Aug 27, 2025, 05:07 AM UTC

    We've remediated the issue and are monitoring. Job creation and trigger builds are looking healthy.

  3. resolved Aug 27, 2025, 05:23 AM UTC

    We've confirmed that the error rates are back to normal since 05:04 UTC.

  4. postmortem Aug 29, 2025, 01:08 AM UTC

    # Service Impact On 2025-08-27 from 04:28 UTC to 04:59 UTC, all customers were unable to create new builds. During this period, only the "create build" functionality was affected. Running builds continued and completed without disruption. All other Buildkite features remained fully operational. # Incident Summary During a routine database schema migration, a required manual step to roll out the changes wasn't executed in the correct sequence. This caused new builds to fail due to a missing database field. As a result, we observed a spike in 5XX responses from our application, and the number of created jobs dropped dramatically. Our engineers quickly identified the issue and took immediate action by manually running the migration to create the missing field and restarting the application servers - rolling forward was found to be the faster resolution. The application recovered rapidly, allowing new builds to be created again. Most of the builds that were attempted during the outage were successfully recovered after the application restart. # Changes we're making We are implementing enhanced guardrails within our schema migration process to automate the required sequence of operations and prevent such process failures in the future.