Spacelift incident

Incident affecting run scheduling

Major Resolved View vendor source →

Spacelift experienced a major incident on July 23, 2025 affecting Event processing and Public workers, lasting 1h 37m. The incident has been resolved; the full update timeline is below.

Started
Jul 23, 2025, 08:39 AM UTC
Resolved
Jul 23, 2025, 10:16 AM UTC
Duration
1h 37m
Detected by Pingoru
Jul 23, 2025, 08:39 AM UTC

Affected components

Event processingPublic workers

Update timeline

  1. investigating Jul 23, 2025, 08:39 AM UTC

    We are currently investigating an incident affecting run scheduling and other asynchronous processing in Spacelift. The incident is causing delays with starting runs and updating statuses in VCS systems. As soon as we have more information we will post a further update.

  2. identified Jul 23, 2025, 08:49 AM UTC

    The problem was caused by a large backlog of messages building up on one of our message queues. The backlog has cleared and the system appears to be operating correctly again. We are currently trying to understand the root cause to prevent it happening again.

  3. monitoring Jul 23, 2025, 09:40 AM UTC

    We have identified the root cause of the incident. We are currently monitoring to make sure the system is stable, and we will take steps to prevent it occurring again.

  4. resolved Jul 23, 2025, 10:16 AM UTC

    After continuing to monitor the system we have confirmed that it is stable again.