Upstash incident

QStash US Region Service Disruption

Critical Resolved View vendor source →

Upstash experienced a critical incident on May 8, 2026 affecting US-EAST-1, lasting 22m. The incident has been resolved; the full update timeline is below.

Started
May 08, 2026, 09:46 AM UTC
Resolved
May 08, 2026, 10:08 AM UTC
Duration
22m
Detected by Pingoru
May 08, 2026, 09:46 AM UTC

Affected components

US-EAST-1

Update timeline

  1. investigating May 08, 2026, 09:46 AM UTC

    We are currently investigating the issue.

  2. investigating May 08, 2026, 09:46 AM UTC

    We are continuing to investigate this issue.

  3. monitoring May 08, 2026, 10:06 AM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved May 08, 2026, 10:08 AM UTC

    This incident has been resolved, we will publish RCA soon.

  5. postmortem May 12, 2026, 08:25 AM UTC

    **Root Cause Analysis** On **April 24**, we deployed a more optimized scheduler implementation in the **US East \(N. Virginia\)** region. On **May 8**, a user who had active schedules deleted their account. Under normal behavior, scheduled tasks associated with a deleted account should wake up, detect that the account no longer exists, and exit after performing cleanup. Due to a bug introduced in the new scheduler implementation, this code path did not return early as intended. Execution continued and resulted in a nil pointer dereference. A second issue then amplified the impact. When a panic occurs in the scheduler, it is designed to be recovered, logged, and isolated so that the process remains healthy. Because of another bug in the panic recovery path, the panic was not properly caught, which caused the worker process handling the scheduled job to terminate. After that process exited, another worker picked up responsibility for delivering the same scheduled task. Since the same faulty execution path was still present, that worker also failed. This created a cascading failure pattern across workers attempting to process the affected schedules. **Resolution** We deployed two fixes: * Added the missing early return in the deleted-account cleanup path, preventing the nil pointer dereference. * Corrected the panic recovery logic so that future panics are safely recovered, logged, and reported without causing worker processes to terminate. With these changes in place, the affected execution path is now safe. Even if a future bug triggers a panic in this area, it will be isolated and reported rather than causing process-level failure.