Astronomer incident

AWS Outage Impacting Astro Deployments

Major Resolved View vendor source →

Astronomer experienced a major incident on October 20, 2025 affecting Scheduling and Running DAGs and Tasks and Scheduling and Running DAGs and Tasks and 1 more component, lasting 11h 48m. The incident has been resolved; the full update timeline is below.

Started
Oct 20, 2025, 10:00 AM UTC
Resolved
Oct 20, 2025, 09:49 PM UTC
Duration
11h 48m
Detected by Pingoru
Oct 20, 2025, 10:00 AM UTC

Affected components

Scheduling and Running DAGs and TasksScheduling and Running DAGs and TasksCloud UICloud UICluster Management

Update timeline

  1. investigating Oct 20, 2025, 10:00 AM UTC

    We are aware of an ongoing AWS outage in the US-EAST-1 (N. Virginia) region that is impacting multiple AWS services and related infrastructure components. Customers with Astro clusters and deployments hosted on AWS may experience degraded performance, failed task executions, or delays in accessing their environments. Our team is actively monitoring the situation and assessing the impact across affected deployments. For real-time updates from AWS, please refer to their Service Health Dashboard - https://health.aws.amazon.com/health/status Next update will be provided as more information becomes available.

  2. investigating Oct 20, 2025, 10:00 AM UTC

    We are continuing to investigate this issue.

  3. investigating Oct 20, 2025, 01:08 PM UTC

    We’re aware that the Airflow UI has been running very slowly following the recent AWS outage. Our team is actively investigating the issue.

  4. investigating Oct 20, 2025, 02:27 PM UTC

    The AWS outage is affecting an internal tool, which is causing Airflow UI slowness in clusters running on all clouds (not just AWS). Our development team is working on a fix.

  5. investigating Oct 20, 2025, 04:12 PM UTC

    We are continuing to investigate this issue.

  6. monitoring Oct 20, 2025, 04:30 PM UTC

    We have made a hotfix update to Astro to relieve the Airflow UI slowness in Azure, GCP, and in AWS regions other than us-east-1. We are continuing to monitor the impact of the change, but early signs indicate that the speed of the UI should be improving. This update has no effect on the issues unique to deployments in AWS us-east-1.

  7. monitoring Oct 20, 2025, 06:19 PM UTC

    We have observed pods that were previously stuck in the pending state slowly getting scheduled on EC2 nodes following mitigations applied by the AWS team. This should start resolving issues with task execution. We are actively monitoring the situation.

  8. resolved Oct 20, 2025, 09:49 PM UTC

    All Astronomer components have returned to a healthy state.