Astronomer incident

Stuck worker pods resulting in tasks failing in the queued state

Major Resolved View vendor source →

Astronomer experienced a major incident on April 18, 2025 affecting Scheduling and Running DAGs and Tasks and Scheduling and Running DAGs and Tasks, lasting 15h 46m. The incident has been resolved; the full update timeline is below.

Started
Apr 18, 2025, 02:25 PM UTC
Resolved
Apr 19, 2025, 06:12 AM UTC
Duration
15h 46m
Detected by Pingoru
Apr 18, 2025, 02:25 PM UTC

Affected components

Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks

Update timeline

  1. investigating Apr 18, 2025, 02:25 PM UTC

    In some deployments, worker pods are getting stuck in the initialization state for an extended period of time. Due to this, queued tasks are unable to run and fail. This is not affecting all deployments. We are investigating which deployments are affected and why.

  2. investigating Apr 18, 2025, 07:12 PM UTC

    We are continuing to investigate this issue.

  3. investigating Apr 18, 2025, 09:36 PM UTC

    The incident is resolved.

  4. resolved Apr 19, 2025, 06:12 AM UTC

    This incident has been resolved.