- Detected by Pingoru
- Apr 27, 2026, 10:04 PM UTC
- Resolved
- Apr 28, 2026, 12:16 AM UTC
- Duration
- 2h 11m
Affected: Deployment AccessDeployment Management
Timeline · 3 updates
-
investigating Apr 27, 2026, 10:04 PM UTC
It seems to be only in AWS clusters for now. We have a workaround while we investigate.
-
investigating Apr 27, 2026, 10:05 PM UTC
We are continuing to investigate this issue.
-
resolved Apr 28, 2026, 12:16 AM UTC
We rolled back a change we made to our authentication system. Any image pushes or config changes to a deployment that occurred after our rollback caused the deployment to fix itself, which is why so many people found their 403s resolved themselves after some time.
Read the full incident report →
- Detected by Pingoru
- Apr 24, 2026, 05:28 PM UTC
- Resolved
- Apr 25, 2026, 05:05 AM UTC
- Duration
- 11h 37m
Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and TasksDeployment ManagementDeployment ManagementCloud Image RepositoryCloud Image Repository
Timeline · 3 updates
-
identified Apr 24, 2026, 05:28 PM UTC
Azure East US has reported multi-service impact that is affecting Astro deployments in the region. For more information on Azure outage, please visit: https://azure.status.microsoft/en-us/status
-
monitoring Apr 25, 2026, 03:56 AM UTC
Azure has fixed the issue, and services are back to normal in East US. We are monitoring to make sure everything stays stable.
-
resolved Apr 25, 2026, 05:05 AM UTC
Azure has confirmed the issue is resolved and services are back to normal. This incident is now closed. For more details, see the Azure incident history: https://azure.status.microsoft/en-us/status/history/
Read the full incident report →
- Detected by Pingoru
- Apr 17, 2026, 08:04 AM UTC
- Resolved
- Apr 17, 2026, 11:47 AM UTC
- Duration
- 3h 42m
Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and Tasks
Timeline · 4 updates
-
investigating Apr 17, 2026, 08:04 AM UTC
We have identified an issue with Astro Executor deployments where terminating workers can continue to accept new tasks, which can, in some cases, can cause these tasks to fail.
-
identified Apr 17, 2026, 08:05 AM UTC
We have identified the issue and we are rolling out the fix for the affected deployments.
-
monitoring Apr 17, 2026, 10:01 AM UTC
The fix has been implemented, and we are now monitoring the deployments.
-
resolved Apr 17, 2026, 11:47 AM UTC
This issue has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 16, 2026, 06:04 PM UTC
- Resolved
- Apr 16, 2026, 10:25 PM UTC
- Duration
- 4h 20m
Affected: Deployment Management
Timeline · 2 updates
-
identified Apr 16, 2026, 06:04 PM UTC
We have identified that Runtime 3.2-1 is incompatible with the Astro Environment Manager. Any Connections or Variables stored at the Workspace level will not be available on deployments running 3.2-1. For this reason, we have disallowed the use of 3.2-1 for any deployments which are not already on that version. We are working to release a 3.2-2 version that is properly compatible as quickly as possible.
-
resolved Apr 16, 2026, 10:25 PM UTC
Astronomer Runtime 3.2-2 has been released with the fix.
Read the full incident report →
- Detected by Pingoru
- Apr 13, 2026, 09:44 PM UTC
- Resolved
- Apr 13, 2026, 10:11 PM UTC
- Duration
- 27m
Affected: Scheduling and Running DAGs and Tasks
Timeline · 3 updates
-
identified Apr 13, 2026, 09:44 PM UTC
We are currently investigating an issue affecting deployments in our Azure West Europe region. Some customer deployments are experiencing degraded performance due to a compute resource constraint in our shared infrastructure. Our engineering team has identified that the region has reached its vCPU quota limit for a specific compute type, which is preventing new resources from being provisioned. We have opened a high-priority support request with Azure to increase this quota and are actively working with our Azure representative to expedite the approval. We will provide updates as soon as we have more information about the timeline for resolution. We apologize for any disruption this may cause to your service.
-
identified Apr 13, 2026, 10:10 PM UTC
We are continuing to work on a fix for this issue.
-
resolved Apr 13, 2026, 10:11 PM UTC
**RESOLVED** – Azure West Europe Cluster Capacity Issue We have resolved a capacity issue affecting our Azure West Europe (westeurope) shared cluster that occurred between 1:34 PM and 3:09 PM PDT on April 13. During this window, the cluster reached its vCPU quota, causing some deployments to become unhealthy. Our team immediately engaged Azure support and provisioned additional capacity as a temporary workaround, allowing all workloads to resume scheduling. All services are now fully operational and healthy. We're working with Azure on a permanent quota increase to prevent recurrence. We apologize for any disruption. If you experienced issues, please contact our support team.
Read the full incident report →
- Detected by Pingoru
- Apr 12, 2026, 10:56 PM UTC
- Resolved
- Apr 13, 2026, 11:12 AM UTC
- Duration
- 12h 15m
Affected: Astro Observe
Timeline · 4 updates
-
investigating Apr 12, 2026, 10:56 PM UTC
We are currently investigating an issue causing false positive alerts via Astro Alerts. Our team is actively investigating the issue.
-
identified Apr 13, 2026, 12:18 AM UTC
We have identified the issue with Astro Alert's degraded performance and currently working on a fix.
-
monitoring Apr 13, 2026, 12:40 AM UTC
We have applied a fix for the Astro Alerts degraded performance and currently monitoring it.
-
resolved Apr 13, 2026, 11:12 AM UTC
The issue has been resolved. Over the weekend, we observed a degradation in alert performance. With the fix now applied, any missed alerts are expected to be delivered now, and new alerts will be triggered without latency. Alerting has returned to normal, and there is no ongoing delay. All alerts should now fire as expected.
Read the full incident report →
- Detected by Pingoru
- Apr 01, 2026, 06:31 PM UTC
- Resolved
- Apr 01, 2026, 07:49 PM UTC
- Duration
- 1h 18m
Affected: Deployment ManagementDeployment Management
Timeline · 3 updates
-
identified Apr 01, 2026, 06:31 PM UTC
The issue has been identified and a fix is being implemented.
-
identified Apr 01, 2026, 06:32 PM UTC
We are continuing to work on a fix for this issue.
-
resolved Apr 01, 2026, 07:49 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 11, 2026, 09:45 AM UTC
- Resolved
- Mar 11, 2026, 09:54 AM UTC
- Duration
- 9m
Affected: Deployment ManagementDeployment Management
Timeline · 4 updates
-
investigating Mar 11, 2026, 09:45 AM UTC
We are currently investigating an issue affecting deployment management operations. Users may experience errors when attempting to perform CRUD operations on deployments. Our engineering team is actively investigating and working to restore normal functionality. We will provide further updates as more information becomes available.
-
investigating Mar 11, 2026, 09:45 AM UTC
We are continuing to investigate this issue.
-
monitoring Mar 11, 2026, 09:51 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 11, 2026, 09:54 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 09, 2026, 01:29 PM UTC
- Resolved
- Mar 09, 2026, 04:11 PM UTC
- Duration
- 2h 41m
Affected: Scheduling and Running DAGs and TasksScheduling and Running DAGs and TasksDeployment ManagementDeployment Management
Timeline · 4 updates
-
investigating Mar 09, 2026, 01:29 PM UTC
Some pods in us-central-1 are experiencing CrashLoopBackoff errors. Because at least some of these errors are affecting the components that control Dag-only deploys, newly scaled up worker pods are being affected in at least some cases. We are actively investigating.
-
investigating Mar 09, 2026, 01:53 PM UTC
Google has confirmed to Astronomer that this is an issue with GKE in the region. We are following the issue closely.
-
identified Mar 09, 2026, 03:30 PM UTC
We are seeing signs of recovery across some affected clusters.
-
resolved Mar 09, 2026, 04:11 PM UTC
The GKE incident is now resolved per Google, and we are seeing that all related issues appear to have cleared up on Astro. If you are still experiencing an issue, please raise a support ticket.
Read the full incident report →
- Detected by Pingoru
- Mar 04, 2026, 02:10 PM UTC
- Resolved
- Mar 05, 2026, 11:05 PM UTC
- Duration
- 1d 8h
Affected: Scheduling and Running DAGs and Tasks
Timeline · 4 updates
-
investigating Mar 04, 2026, 02:10 PM UTC
We are currently investigating an issue affecting Airflow worker autoscaling for some deployments hosted on the Shared Azure Cluster in the US-East2 region. As a result of this issue, worker pods may not scale up as expected in response to workload demand. This can lead to tasks remaining in a queued state longer than usual and, in some cases, failing due to queued timeouts. Our engineering team is actively working to identify the root cause and restore normal autoscaling behaviour. We will provide further updates as more information becomes available. Impact: • Affected deployments may experience delayed task execution. • Worker pods may not scale up from zero or may not scale as expected under load. We will continue to share updates on this page as we make progress toward resolution.
-
identified Mar 04, 2026, 02:31 PM UTC
We have identified the root cause of the autoscaling issue affecting certain deployments on the Shared Azure Cluster in the US-East2 region. The issue is related to underlying infrastructure behaviour within the cloud provider environment. Azure has acknowledged the issue and is currently rolling out a hotfix across regions. We are actively monitoring the rollout and will provide further updates as they become available.
-
monitoring Mar 05, 2026, 04:49 PM UTC
Azure confirmed that the fix has been successfully deployed to East US 2 region. We are monitoring the results.
-
resolved Mar 05, 2026, 11:05 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 28, 2026, 07:10 PM UTC
- Resolved
- Feb 28, 2026, 09:32 PM UTC
- Duration
- 2h 22m
Affected: Deployment AccessDeployment ManagementCloud UIAstro ObserveCloud API
Timeline · 3 updates
-
investigating Feb 28, 2026, 07:10 PM UTC
We are still in the process of performing critical maintenance on the Astro control plane. Due to upstream dependencies by our cloud providers our changes have taken longer than anticipated. We are actively working to resolve this as quickly as possible and expect to conclude by 20:00 UTC. As a reminder: Your DAGs and tasks will continue to run normally but you may not be able to view them in the UI during the maintenance. DAGs that use the Astro API (such as cross-deployment DAG triggering) will be impacted until this work is complete.
-
identified Feb 28, 2026, 07:36 PM UTC
Maintenance completed. We are currently looking into bringing Astro Observe up to operational.
-
resolved Feb 28, 2026, 09:32 PM UTC
Astro Observe is fully operational! Incident is now resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 24, 2026, 09:51 PM UTC
- Resolved
- Feb 24, 2026, 10:49 PM UTC
- Duration
- 57m
Affected: Deployment AccessDeployment AccessDeployment ManagementDeployment Management
Timeline · 4 updates
-
investigating Feb 24, 2026, 09:51 PM UTC
We are currently investigating this issue.
-
identified Feb 24, 2026, 09:58 PM UTC
https://www.astronomer.io/docs/runtime/runtime-release-notes#astro-runtime-1350
-
monitoring Feb 24, 2026, 10:27 PM UTC
13.5.0 has been officially yanked from the registry. 13.5.1 will follow but for now stay on 13.4.0
-
resolved Feb 24, 2026, 10:49 PM UTC
This incident has been resolved.
Read the full incident report →
Critical February 13, 2026 - Detected by Pingoru
- Feb 13, 2026, 10:54 AM UTC
- Resolved
- Feb 13, 2026, 12:16 PM UTC
- Duration
- 1h 22m
Affected: Astro Observe
Timeline · 4 updates
-
investigating Feb 13, 2026, 10:54 AM UTC
We are currently investigating this issue.
-
investigating Feb 13, 2026, 11:11 AM UTC
Observability processors are unavailable due to a database connectivity issue, and we are currently investigating. Observability features and Astro Alerts are currently unavailable.
-
monitoring Feb 13, 2026, 11:18 AM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Feb 13, 2026, 12:16 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 05, 2026, 12:41 PM UTC
- Resolved
- Feb 05, 2026, 01:42 PM UTC
- Duration
- 1h
Affected: Scheduling and Running DAGs and Tasks
Timeline · 3 updates
-
identified Feb 05, 2026, 12:41 PM UTC
We have identified an issue with some Astro Executor deployments where the graceful termination period of workers is not respected, leading to potential task failures. We are rolling out a fix for the affected Deployments
-
monitoring Feb 05, 2026, 01:42 PM UTC
The fix has been applied to the affected deployments, and the issue should no longer be observed.
-
resolved Feb 05, 2026, 01:42 PM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Feb 02, 2026, 08:43 PM UTC
- Resolved
- Feb 03, 2026, 08:56 AM UTC
- Duration
- 12h 13m
Affected: Scheduling and Running DAGs and Tasks
Timeline · 6 updates
-
investigating Feb 02, 2026, 08:43 PM UTC
An potential Azure outage is resulting in some Azure clusters being unable to scale worker nodes. As a result, some tasks may fail. We are investigating internally and working with Azure to restore service as soon as possible.
-
investigating Feb 02, 2026, 08:43 PM UTC
We are continuing to investigate this issue.
-
identified Feb 02, 2026, 09:46 PM UTC
Azure has confirmed the outage. Please see their status page for additional information https://azure.status.microsoft/en-us/status
-
monitoring Feb 03, 2026, 12:23 AM UTC
We are still waiting on confirmation from Azure that the issue has passed, but we are no longer seeing impact to Astronomer workloads.
-
monitoring Feb 03, 2026, 01:50 AM UTC
We are now seeing additional clusters being affected. Issue is still ongoing.
-
resolved Feb 03, 2026, 08:56 AM UTC
Mitigations have been fully applied across all affected regions by Azure, and validations confirm that configurations have been successfully updated. Customers should now see full recovery of service management operations, and dependent services are operating as expected.
Read the full incident report →