Astronomer incident
Stuck worker pods resulting in tasks failing in the queued state
Astronomer experienced a major incident on April 18, 2025 affecting Scheduling and Running DAGs and Tasks and Scheduling and Running DAGs and Tasks, lasting 15h 46m. The incident has been resolved; the full update timeline is below.
Affected components
Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks
Update timeline
- investigating Apr 18, 2025, 02:25 PM UTC
In some deployments, worker pods are getting stuck in the initialization state for an extended period of time. Due to this, queued tasks are unable to run and fail. This is not affecting all deployments. We are investigating which deployments are affected and why.
- investigating Apr 18, 2025, 07:12 PM UTC
We are continuing to investigate this issue.
- investigating Apr 18, 2025, 09:36 PM UTC
The incident is resolved.
- resolved Apr 19, 2025, 06:12 AM UTC
This incident has been resolved.