Astronomer Outage History

Astronomer had 48 outages in the last 2 years totaling 265h 6m of downtime — averaging 2 incidents per month.

There were 48 Astronomer outages since March 26, 2025 totaling 265h 6m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.astronomer.io

Major October 30, 2025

Deployment Health Incidents are not available

Detected by Pingoru: Oct 30, 2025, 12:58 AM UTC
Resolved: Oct 30, 2025, 04:27 AM UTC
Duration: 3h 28m

Affected: Deployment ManagementDeployment Management

Timeline · 2 updates

investigating Oct 30, 2025, 12:58 AM UTC

https://www.astronomer.io/docs/astro/deployment-health-incidents
resolved Oct 30, 2025, 04:27 AM UTC

Deployment Health Incidents have been temporarily disabled.

Read the full incident report →

Minor October 29, 2025

Astro UI performance may be severely degraded or unavailable

Detected by Pingoru: Oct 29, 2025, 11:57 PM UTC
Resolved: Oct 30, 2025, 12:58 AM UTC
Duration: 1h 1m

Affected: Deployment AccessDeployment AccessDeployment ManagementDeployment ManagementCloud UICloud UIAstro ObserveCloud APICloud APICloud Image RepositoryCloud Image RepositoryCluster ManagementDashboards and Analytics

Timeline · 4 updates

investigating Oct 29, 2025, 11:57 PM UTC

We are currently investigating this issue.
investigating Oct 30, 2025, 12:51 AM UTC

Deployment Health Incidents may not be working https://www.astronomer.io/docs/astro/deployment-health-incidents
monitoring Oct 30, 2025, 12:51 AM UTC

Deployment Health Incidents may not be working https://www.astronomer.io/docs/astro/deployment-health-incidents
resolved Oct 30, 2025, 12:58 AM UTC

This incident has been resolved.

Read the full incident report →

Critical October 29, 2025

Azure Front Door CDN issue causing timeouts to Astro Cloud UI

Detected by Pingoru: Oct 29, 2025, 04:05 PM UTC
Resolved: Oct 30, 2025, 12:49 AM UTC
Duration: 8h 44m

Affected: Cloud UICloud UI

Timeline · 4 updates

investigating Oct 29, 2025, 04:05 PM UTC

We are currently investigating an apparent issue with Azure's Front Door CDN that is causing DNS timeouts and unavailability of the Astro Cloud UI. Airflow deployments are unaffected and continue to process tasks. We have updated our DNS routing to bypass Front Door for now and are seeing access to the Cloud UI being restored.
monitoring Oct 29, 2025, 04:05 PM UTC

A fix has been implemented and we are monitoring the results.
monitoring Oct 29, 2025, 04:35 PM UTC

We are continuing to monitor for any further affects from this Azure outage. You can follow Azure's status here: https://azure.status.microsoft/en-us/status
resolved Oct 30, 2025, 12:49 AM UTC

Resolved per Azure status; https://azure.status.microsoft/en-us/status

Read the full incident report →

Minor October 28, 2025

Issues with EC2 scale up time in AWS us-east-1, use1-az2 Availability Zone causing node scale up issues for some deployment in AWs us-east-1

Detected by Pingoru: Oct 28, 2025, 06:05 PM UTC
Resolved: Oct 28, 2025, 11:51 PM UTC
Duration: 5h 46m

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks

Timeline · 3 updates

investigating Oct 28, 2025, 06:05 PM UTC

We are currently investigating this issue and have been notified by AWS of issues with VM scale ups in AWS us-east-1, specifically the use1-az2 AZ. We are working to identify affected deployments, as well as on mitigations.
monitoring Oct 28, 2025, 07:39 PM UTC

Customer deployments should no longer be affected, we are monitoring to confirm resolution.
resolved Oct 28, 2025, 11:51 PM UTC

This incident has been resolved.

Read the full incident report →

Major October 20, 2025

AWS Outage Impacting Astro Deployments

Detected by Pingoru: Oct 20, 2025, 10:00 AM UTC
Resolved: Oct 20, 2025, 09:49 PM UTC
Duration: 11h 48m

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and TasksCloud UICloud UICluster Management

Timeline · 8 updates

investigating Oct 20, 2025, 10:00 AM UTC

We are aware of an ongoing AWS outage in the US-EAST-1 (N. Virginia) region that is impacting multiple AWS services and related infrastructure components. Customers with Astro clusters and deployments hosted on AWS may experience degraded performance, failed task executions, or delays in accessing their environments. Our team is actively monitoring the situation and assessing the impact across affected deployments. For real-time updates from AWS, please refer to their Service Health Dashboard - https://health.aws.amazon.com/health/status Next update will be provided as more information becomes available.
investigating Oct 20, 2025, 10:00 AM UTC

We are continuing to investigate this issue.
investigating Oct 20, 2025, 01:08 PM UTC

We’re aware that the Airflow UI has been running very slowly following the recent AWS outage. Our team is actively investigating the issue.
investigating Oct 20, 2025, 02:27 PM UTC

The AWS outage is affecting an internal tool, which is causing Airflow UI slowness in clusters running on all clouds (not just AWS). Our development team is working on a fix.
investigating Oct 20, 2025, 04:12 PM UTC

We are continuing to investigate this issue.
monitoring Oct 20, 2025, 04:30 PM UTC

We have made a hotfix update to Astro to relieve the Airflow UI slowness in Azure, GCP, and in AWS regions other than us-east-1. We are continuing to monitor the impact of the change, but early signs indicate that the speed of the UI should be improving. This update has no effect on the issues unique to deployments in AWS us-east-1.
monitoring Oct 20, 2025, 06:19 PM UTC

We have observed pods that were previously stuck in the pending state slowly getting scheduled on EC2 nodes following mitigations applied by the AWS team. This should start resolving issues with task execution. We are actively monitoring the situation.
resolved Oct 20, 2025, 09:49 PM UTC

All Astronomer components have returned to a healthy state.

Read the full incident report →

Minor October 6, 2025

Internal API intermittent outage causing slow UI loading, intermittent Airflow API issues

Detected by Pingoru: Oct 06, 2025, 04:08 PM UTC
Resolved: Oct 06, 2025, 10:08 PM UTC
Duration: 5h 59m

Affected: Deployment AccessDeployment AccessDeployment ManagementDeployment ManagementCloud UICloud UICloud APICloud APICloud Image RepositoryCloud Image RepositoryCluster ManagementDashboards and Analytics

Timeline · 3 updates

identified Oct 06, 2025, 04:08 PM UTC

We are investigating this issue and have identified the source of the issue and are putting mitigations in place. Airflow deployment task execution is unaffected. Dag triggering via Airflow API may experience intermittent issues.
monitoring Oct 06, 2025, 05:52 PM UTC

The root cause of this incident has been fully identified, and our engineering teams are continuing to work on implementing mitigations. We are also continuing to monitor this issue and the underlying systems associated. Incidence of this issue has dropped significantly, however degraded performance is still possible.
resolved Oct 06, 2025, 10:08 PM UTC

This incident has been resolved.

Read the full incident report →

Critical October 1, 2025

Airflow deployments unhealthy due to scheduler issues

Detected by Pingoru: Oct 01, 2025, 05:16 PM UTC
Resolved: Oct 01, 2025, 10:17 PM UTC
Duration: 5h

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and TasksDeployment AccessDeployment Access

Timeline · 4 updates

Read the full incident report →

Major September 4, 2025

Some clusters experiencing unintentional ephemeral storage reduction for KPOs

Detected by Pingoru: Sep 04, 2025, 09:48 PM UTC
Resolved: Sep 04, 2025, 11:40 PM UTC
Duration: 1h 51m

Affected: Scheduling and Running DAGs and Tasks

Timeline · 3 updates

investigating Sep 04, 2025, 09:48 PM UTC

We are currently investigating some hosted clusters experiencing an unintentional reduction in ephemeral storage for default kubernetes pods
identified Sep 04, 2025, 10:17 PM UTC

The issue has been identified and a fix is being implemented.
resolved Sep 04, 2025, 11:40 PM UTC

This incident has been resolved.

Read the full incident report →

Notice August 13, 2025

Docs for the self-hosted Astronomer Software product are down

Detected by Pingoru: Aug 13, 2025, 07:33 PM UTC
Resolved: Aug 13, 2025, 10:33 PM UTC
Duration: 2h 59m

Timeline · 3 updates

investigating Aug 13, 2025, 07:33 PM UTC

We are currently migrating our docs to a new platform. All docs are working except the docs for our self-hosted platform generally referred to as Astronomer Software. You can find a working mirror here https://clear-mousepad.cloudvent.net/docs/software/
identified Aug 13, 2025, 09:16 PM UTC

The issue has been identified and a fix is being implemented.
resolved Aug 13, 2025, 10:33 PM UTC

This incident has been resolved.

Read the full incident report →

Major August 13, 2025

Issue with Live task logs with Runtime 3.0-7

Detected by Pingoru: Aug 13, 2025, 08:13 AM UTC
Resolved: Aug 19, 2025, 05:01 PM UTC
Duration: 6d 8h

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks

Timeline · 3 updates

investigating Aug 13, 2025, 08:13 AM UTC

We are aware of an issue affecting live task logs with Runtime 3.0-7 and are actively investigating the issue
identified Aug 14, 2025, 08:47 AM UTC

The issue has been identified and a fix is being implemented.
resolved Aug 19, 2025, 05:01 PM UTC

Astro Runtime 3.0-8 has been released, which contains the fix for this issue. Release notes are available here: https://www.astronomer.io/docs/astro/runtime-release-notes#astro-runtime-30-8

Read the full incident report →

Minor August 11, 2025

False Positive SLA Violation Alerts on Astro Observe

Detected by Pingoru: Aug 11, 2025, 03:10 PM UTC
Resolved: Aug 11, 2025, 05:45 PM UTC
Duration: 2h 34m

Affected: Astro Observe

Timeline · 3 updates

investigating Aug 11, 2025, 03:10 PM UTC

We are seeing reports that users of Astro Observe are getting false positive alerts for Data Freshness SLAs. We are currently investigating the cause of these false positives.
monitoring Aug 11, 2025, 03:22 PM UTC

A fix has been implemented for the false positive alerts affecting Data Freshness SLAs in Astro Observe. Improvement has been observed, we'll continue monitoring to ensure full resolution.
resolved Aug 11, 2025, 05:45 PM UTC

The false positive alerts for Data Freshness SLAs in Astro Observe have been resolved.

Read the full incident report →

Major July 21, 2025

Customers using Azure-managed subscriptions may be unable to access the Astro UI.

Detected by Pingoru: Jul 21, 2025, 09:23 AM UTC
Resolved: Jul 21, 2025, 02:05 PM UTC
Duration: 4h 41m

Affected: Deployment AccessDeployment ManagementCloud UICloud APIDashboards and Analytics

Timeline · 5 updates

investigating Jul 21, 2025, 09:23 AM UTC

We are actively investigating the issue.
investigating Jul 21, 2025, 01:04 PM UTC

We are continuing to investigate this issue.
identified Jul 21, 2025, 01:04 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Jul 21, 2025, 01:30 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Jul 21, 2025, 02:05 PM UTC

This incident has been resolved.

Read the full incident report →

Major July 18, 2025

GCP us-east1 incident may affect Astro clusters in this region

Detected by Pingoru: Jul 18, 2025, 04:25 PM UTC
Resolved: Jul 18, 2025, 06:33 PM UTC
Duration: 2h 8m

Affected: Scheduling and Running DAGs and Tasks

Timeline · 2 updates

investigating Jul 18, 2025, 04:25 PM UTC

Google Cloud has put up a status page indicating that several services in us-east1 are affected by an incident. Astro clusters in this region will be affected by this. Clusters in other regions and other clouds should not be affected, as none of the control plane components for Astro are hosted in this region. For more information, follow the GCP incident: https://status.cloud.google.com/incidents/8cY8jdUpEGGbsSMSQk7J
resolved Jul 18, 2025, 06:33 PM UTC

Per Google, this outage is now resolved.

Read the full incident report →

Major July 10, 2025

GitHub Integration Image Deploys Failing

Detected by Pingoru: Jul 10, 2025, 03:24 PM UTC
Resolved: Jul 10, 2025, 05:06 PM UTC
Duration: 1h 42m

Affected: Deployment ManagementDeployment Management

Timeline · 2 updates

investigating Jul 10, 2025, 03:24 PM UTC

We’re currently investigating an issue where GitHub Integration image deploys are failing.
resolved Jul 10, 2025, 05:06 PM UTC

The issue affecting GitHub Integration image deploys has been resolved.

Read the full incident report →

Critical July 3, 2025

Some clusters are unable to start new KPO tasks

Detected by Pingoru: Jul 03, 2025, 11:19 PM UTC
Resolved: Jul 04, 2025, 02:10 AM UTC
Duration: 2h 51m

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks

Timeline · 4 updates

investigating Jul 03, 2025, 11:19 PM UTC

Some clusters that were updated today will fail to run any KPO tasks
identified Jul 03, 2025, 11:52 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Jul 04, 2025, 01:06 AM UTC

A fix has been implemented. We are currently monitoring the results.
resolved Jul 04, 2025, 02:10 AM UTC

This incident has been resolved.

Read the full incident report →

Minor July 1, 2025

Astro CLI 1.35 may unintentionally modify worker queue configurations

Detected by Pingoru: Jul 01, 2025, 01:59 PM UTC
Resolved: Jul 01, 2025, 06:25 PM UTC
Duration: 4h 26m

Affected: Deployment ManagementDeployment Management

Timeline · 4 updates

investigating Jul 01, 2025, 01:59 PM UTC

Upgrading to Astro CLI version 1.35 can lead to unintended changes in your worker queues settings, particularly when deploying with modified or missing workerQueues definitions. Please use CLI version 1.34 or lower.
identified Jul 01, 2025, 02:03 PM UTC

We’ve identified the root cause of the issue in Astro CLI version 1.35 that results in unintentional modifications to worker queues. We have yanked Astro CLI 1.35. Yanked release: https://github.com/astronomer/astro-cli/releases/tag/v1.35.0
identified Jul 01, 2025, 04:55 PM UTC

Deploys using Astro CLI version 1.35 are now blocked to prevent unintentional changes to worker queues. Please use CLI version 1.34 or lower.
resolved Jul 01, 2025, 06:25 PM UTC

We’ve verified that all affected customers have been contacted regarding the issue with Astro CLI version 1.35.0.

Read the full incident report →

Major June 12, 2025

Astro clusters in GCP are having scaling issues due to a GCP outage

Detected by Pingoru: Jun 12, 2025, 06:49 PM UTC
Resolved: Jun 12, 2025, 11:23 PM UTC
Duration: 4h 34m

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks

Timeline · 5 updates

investigating Jun 12, 2025, 06:49 PM UTC

There is an active GCP outage that is affecting Astro customers using GCP
investigating Jun 12, 2025, 06:50 PM UTC

https://status.cloud.google.com/ We will continue to monitor the issue and update this page
identified Jun 12, 2025, 06:53 PM UTC

At this time we believe that Deployments on Azure and AWS are unaffected. We are currently checking our components to be certain. We have seen task failure rates increase on Astro Deployments on GCP. We will pass along any updates we receive from Google regarding this issue.
monitoring Jun 12, 2025, 09:49 PM UTC

Updates from status.cloud.google.com indicate most if not all of the issues are no longer occurring. On Astro, we have seen our metrics return to normal. We tentatively believe that this problems affecting Astro have passed. We continue to monitor the situation.
resolved Jun 12, 2025, 11:23 PM UTC

This incident appears to be resolved as it pertains to Astro.

Read the full incident report →

Minor June 5, 2025

Creating a Connection can crash the browser tab

Detected by Pingoru: Jun 05, 2025, 11:48 PM UTC
Resolved: Jun 06, 2025, 03:18 PM UTC
Duration: 15h 30m

Affected: Cloud UI

Timeline · 3 updates

investigating Jun 05, 2025, 11:48 PM UTC

If you create certain Connection types in the Environments menu, it can crash your browser tab. Currently it affects SSH, SMTP, SFTP, Postgres, and Generic. There could be others. We are investigating currently.
identified Jun 06, 2025, 12:48 PM UTC

The issue has been identified, and our team is actively working on a fix.
resolved Jun 06, 2025, 03:18 PM UTC

This issue has been resolved.

Read the full incident report →

Major May 20, 2025

403 Errors for Image Deploys

Detected by Pingoru: May 20, 2025, 04:02 PM UTC
Resolved: May 20, 2025, 05:25 PM UTC
Duration: 1h 23m

Affected: Deployment ManagementCloud Image Repository

Timeline · 3 updates

investigating May 20, 2025, 04:02 PM UTC

A small subset of customers have reported 403 errors when running the astro deploy command to deploy a new image. We are actively investigating this issue. If you are experiencing these errors, we encourage you to contact support and include the login command you used, astro cli and docker versions, and any log messages.
monitoring May 20, 2025, 04:52 PM UTC

We've implemented a mitigation for this issue and the affected clusters should see successful image pushes. We will continue to monitor for additional errors.
resolved May 20, 2025, 05:25 PM UTC

We have determined that this error is caused by cached credentials which are no longer valid after an internal change in Astro to the image registry. The fix must be performed client-side (i.e. on the machine running `astro deploy`). If you experience this error, run `docker logout` for each Astro registry that this machine has cached credentials for. By default, credentials are stored in ~/.docker/config.json, and if you are using this default setting, the following bash script will identify cached credentials and run docker logout for those that correspond to Astro registries. for domain in $(grep 'registry.astronomer.run' ~/.docker/config.json | awk '{print $1}' | tr -d '":' | sort | uniq); do docker logout "$domain" done

Read the full incident report →

Major May 16, 2025

Identified a configuration issue affecting Runtime 9 which is affecting DAG execution on these deployments

Detected by Pingoru: May 16, 2025, 11:28 AM UTC
Resolved: May 16, 2025, 01:09 PM UTC
Duration: 1h 41m

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks

Timeline · 3 updates

investigating May 16, 2025, 11:28 AM UTC

We are currently investigating the issue.
identified May 16, 2025, 12:12 PM UTC

Fix has been validated and is rolling out to affected deployments.
resolved May 16, 2025, 01:09 PM UTC

This incident has been resolved.

Read the full incident report →

Major April 18, 2025

Stuck worker pods resulting in tasks failing in the queued state

Detected by Pingoru: Apr 18, 2025, 02:25 PM UTC
Resolved: Apr 19, 2025, 06:12 AM UTC
Duration: 15h 46m

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks

Timeline · 4 updates

investigating Apr 18, 2025, 02:25 PM UTC

In some deployments, worker pods are getting stuck in the initialization state for an extended period of time. Due to this, queued tasks are unable to run and fail. This is not affecting all deployments. We are investigating which deployments are affected and why.
investigating Apr 18, 2025, 07:12 PM UTC

We are continuing to investigate this issue.
investigating Apr 18, 2025, 09:36 PM UTC

The incident is resolved.
resolved Apr 19, 2025, 06:12 AM UTC

This incident has been resolved.

Read the full incident report →

Minor April 7, 2025

Cost Breakdown Dashboard data update delayed

Detected by Pingoru: Apr 07, 2025, 01:42 PM UTC
Resolved: Apr 07, 2025, 07:18 PM UTC
Duration: 5h 35m

Affected: Cloud UI

Timeline · 3 updates

identified Apr 07, 2025, 01:42 PM UTC

Data shown in the Organization Dashboards Cost Breakdown (for Enterprise customers) is delayed. As stated on the page itself, the latest data is as of April 4th. The processing to update this dashboard is currently ongoing, and we expect the data to be refreshed at approximately 16:00 UTC.
identified Apr 07, 2025, 04:43 PM UTC

Deployment cost is now up to date, but compute costs for some customers remain outdated. We were working with our billing vendor to determine the source of the issue.
resolved Apr 07, 2025, 07:18 PM UTC

This issue is now resolved except for one customer who we have contacted directly.

Read the full incident report →

Major March 26, 2025

We are experiencing an issue with new task execution on AWS clusters

Detected by Pingoru: Mar 26, 2025, 04:59 AM UTC
Resolved: Mar 26, 2025, 07:42 AM UTC
Duration: 2h 42m

Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and TasksCloud UICloud UICloud APICloud API

Timeline · 4 updates

Read the full incident report →