Alation Cloud Service Outage History

Alation Cloud Service had 21 outages in the last 2 years totaling 249h 41m of downtime — averaging 0.9 incidents per month.

There were 21 Alation Cloud Service outages since June 30, 2024 totaling 249h 41m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.alationcloud.com

Minor May 20, 2026

Data Products Service — Elevated Latency and Errors

Detected by Pingoru: May 20, 2026, 03:38 PM UTC
Resolved: May 20, 2026, 07:04 PM UTC
Duration: 3h 26m

Affected: Americas (US-east) - DevAmericas (US-east)

Timeline · 4 updates

investigating May 20, 2026, 03:38 PM UTC

We are investigating elevated latency and intermittent errors affecting the Data Products and Alation AI. Other Alation functionality is unaffected. Mitigations are in progress and our team is actively working to identify the root cause.
identified May 20, 2026, 04:07 PM UTC

We have identified the root cause of the elevated latency affecting Data Products and Alation AI. The team is working on mitigation.
monitoring May 20, 2026, 05:25 PM UTC

The underlying issue has been mitigated. Data Products and Alation AI are returning to normal operation. We are continuing to monitor service health to confirm full recovery.
resolved May 20, 2026, 07:04 PM UTC

This incident has been resolved. Data Products and Alation AI are operating normally. A detailed RCA will be shared shortly.

Read the full incident report →

Minor April 24, 2026

Service Disruption Affecting Agent Interactions

Detected by Pingoru: Apr 24, 2026, 01:00 AM UTC
Resolved: Apr 25, 2026, 06:52 AM UTC
Duration: 1d 5h

Affected: Americas (US-east)Americas (US-west)Canada (Montreal)EMEA (Ireland)EMEA (Frankfurt)APAC (Sydney)APAC (Singapore)APAC (Tokyo)APAC (Mumbai)

Timeline · 4 updates

investigating Apr 24, 2026, 02:38 PM UTC

We recently experienced a service disruption that caused agent interactions to fail. The issue was traced to an expired token, which prevented a backend service from writing query results to storage. We have applied a temporary mitigation by recycling the affected tenant, which has restored normal functionality. Our team is actively working on a permanent fix to prevent this issue from recurring. Impact: This issue affected agent interactions only. All other platform functionality remained unaffected
identified Apr 24, 2026, 02:39 PM UTC

We have applied a temporary mitigation by recycling the affected tenant, which has restored normal functionality. Our team is actively working on a permanent fix to prevent this issue from recurring.
monitoring Apr 24, 2026, 05:47 PM UTC

The issue causing agent interaction failures has been resolved, and the agent system is now fully functional. We are actively monitoring system health to ensure continued stability.
resolved Apr 25, 2026, 06:52 AM UTC

This incident has been resolved.

Read the full incident report →

Notice April 23, 2026

Alation Service Degradation - Catalog editor

Detected by Pingoru: Apr 23, 2026, 03:57 PM UTC
Resolved: Apr 23, 2026, 03:57 PM UTC
Duration: —

Affected: Americas (US-east) - DevAmericas (US-east)Americas (US-west)Americas (US-west) - DevCanada (Montreal)Canada (Montreal) - DevEMEA (Ireland)EMEA (Ireland) - DevEMEA (Frankfurt) - DevEMEA (Frankfurt)APAC (Sydney)APAC (Sydney) - DevAPAC (Singapore) - DevAPAC (Singapore)APAC (Tokyo)APAC (Tokyo) - DevAPAC (Mumbai)APAC (Mumbai) - DevPoV (Proof of Value)

Timeline · 1 update

resolved Apr 23, 2026, 03:57 PM UTC

We discovered an issue where Rich Text Editor fields across the Catalog are not displaying content correctly. The issue has been resolved by rolling back the problematic deployment.

Read the full incident report →

Notice March 11, 2026

Alation service degradation - Alation agent

Detected by Pingoru: Mar 11, 2026, 11:47 PM UTC
Resolved: Mar 12, 2026, 06:23 AM UTC
Duration: 6h 35m

Affected: Americas (US-east)Americas (US-west)

Timeline · 2 updates

monitoring Mar 11, 2026, 11:47 PM UTC

A service interruption to Alation agent was encountered by some customers in the US-East-1 and US-West-2 regions. The service interruption has been remediated, and we are monitoring the status.
resolved Mar 12, 2026, 06:23 AM UTC

We have not seen the error reoccur in the last few hours; we are marking the incident as resolved.

Read the full incident report →

Major February 19, 2026

Alation Service Degradation

Detected by Pingoru: Feb 19, 2026, 11:01 AM UTC
Resolved: Feb 19, 2026, 08:42 PM UTC
Duration: 9h 40m

Affected: Americas (US-east)

Timeline · 8 updates

Read the full incident report →

Minor February 2, 2026

Degraded Service - Alation.

Detected by Pingoru: Feb 02, 2026, 01:01 PM UTC
Resolved: Feb 02, 2026, 04:51 PM UTC
Duration: 3h 49m

Affected: Americas (US-east)

Timeline · 5 updates

investigating Feb 02, 2026, 02:01 PM UTC

Alation service has recovered for most tenants and is operating normally. However, a limited number of tenants are still experiencing service disruption (login failures, timeouts, or degraded performance). Our engineering team is actively working with priority to restore service for the remaining affected tenants.
identified Feb 02, 2026, 02:18 PM UTC

Service has been restored for the majority of tenants. We have identified an issue affecting a small subset of tenants that are still experiencing errors and/or degraded performance. Targeted remediation is in progress to recover the remaining impacted tenants.
identified Feb 02, 2026, 03:01 PM UTC

Most tenants have recovered. A small subset of tenants still remains impacted; we’re continuing targeted remediation
monitoring Feb 02, 2026, 03:24 PM UTC

Service has been restored for the tenants that experienced failures. We are actively monitoring the infrastructure and application to validate expected behaviour.
resolved Feb 02, 2026, 04:51 PM UTC

Incident resolved. We’ll continue routine monitoring and will follow up if anything changes.”

Read the full incident report →

Notice October 21, 2025

Metadata extraction / QLI failures with BAD REQUEST HTTP response headers

Detected by Pingoru: Oct 21, 2025, 05:26 PM UTC
Resolved: Oct 21, 2025, 09:04 PM UTC
Duration: 3h 38m

Affected: Americas (US-east) - DevAmericas (US-east)

Timeline · 3 updates

investigating Oct 21, 2025, 05:26 PM UTC

We are currently investigating an issue with the MDE Pipeline service, which is preventing data extraction and causing errors. The error is related to a timeout connection to the pipeline service. Our team is working to resolve the issue as quickly as possible. We will keep you posted with the progress as it becomes available.
identified Oct 21, 2025, 06:35 PM UTC

Cause has been identified and fix implemented. Working on resolution.
resolved Oct 21, 2025, 09:04 PM UTC

Fix has been implemented and confirmed to successfully resolve the issue. Root cause was result of AWS US-East-1 outage from previous day (Monday, October 20).

Read the full incident report →

Minor October 20, 2025

Service Degradation for EU Customers

Detected by Pingoru: Oct 20, 2025, 09:45 AM UTC
Resolved: Oct 20, 2025, 02:25 PM UTC
Duration: 4h 39m

Affected: EMEA (Ireland)

Timeline · 3 updates

identified Oct 20, 2025, 02:18 PM UTC

We are investigating reports of degraded performance impacting customers in the EU region.
identified Oct 20, 2025, 02:20 PM UTC

A subset of EU customers may experience: Slower load times or timeouts when accessing the Alation application. Delays in query execution, search indexing, and accessing catalog services
resolved Oct 20, 2025, 02:25 PM UTC

The issue that was impacting customers in the EU region has been resolved, system performance is showing normal performance, and the services are now operating normally.

Read the full incident report →

Critical October 20, 2025

Third-party provider outage (AWS)

Detected by Pingoru: Oct 20, 2025, 07:40 AM UTC
Resolved: Oct 20, 2025, 10:00 AM UTC
Duration: 2h 19m

Affected: Americas (US-east) - DevAmericas (US-east)

Timeline · 4 updates

identified Oct 20, 2025, 08:07 AM UTC

We have detected elevated error rates and degraded performance across parts of the Alation platform. This is caused by a service disruption at AWS, which is affecting one or more of their core services that Alation depends on. Our own systems are healthy, but upstream instability is affecting service delivery for our users. Impact: Some users may experience slower response times, timeouts, or failures when using certain features (for example: data catalog search, ingestion jobs, API calls or dashboard refreshes). Data integrity is not impacted; no data loss or corruption has been detected. Queued operations will retry automatically once upstream services recover.
identified Oct 20, 2025, 09:52 AM UTC

AWS states that they are still working on finding the root cause and actively working on the issue.
monitoring Oct 20, 2025, 09:54 AM UTC

AWS further reports “significant signs of recovery”: most requests should now be succeeding, though some services still have latency and backlog to clear. We see early signs of Alation service recovery; we will keep you updated.
resolved Oct 20, 2025, 10:00 AM UTC

The underlying AWS service has recovered, and all Alation services have returned to normal operation for affected customers. Our teams will continue to monitor the environment to ensure continued stability.

Read the full incident report →

Notice July 16, 2025

Latency across multiple regions

Detected by Pingoru: Jul 16, 2025, 03:59 PM UTC
Resolved: Jul 16, 2025, 08:59 PM UTC
Duration: 5h

Timeline · 3 updates

investigating Jul 16, 2025, 03:59 PM UTC

Our engineering teams are working to identify the issue and are actively working to mitigate the impact. We will provide updates here every 30 minutes or as new information becomes available.
monitoring Jul 16, 2025, 06:25 PM UTC

We have identified and deployed a fix across all tenants and are monitoring performance.
resolved Jul 16, 2025, 08:59 PM UTC

This incident has been resolved.

Read the full incident report →

Notice July 15, 2025

Investigating latency across multiple regions

Detected by Pingoru: Jul 15, 2025, 04:27 PM UTC
Resolved: Jul 16, 2025, 05:56 AM UTC
Duration: 13h 28m

Timeline · 4 updates

investigating Jul 15, 2025, 04:27 PM UTC

We are currently investigating reports of latency across multiple regions. This is impacting availability and performance for some customers using our services. Our engineering teams are working to identify the issue and are actively working to mitigate the impact. We will provide updates here every 30 minutes or as new information becomes available.
monitoring Jul 15, 2025, 05:23 PM UTC

We have implemented a revised memory configuration, resulting in enhanced system stability and reduced latency. We will continue monitoring through the day and provide updates as appropriate.
monitoring Jul 15, 2025, 08:56 PM UTC

We are continuing to monitor environments for issues.
resolved Jul 16, 2025, 05:56 AM UTC

This incident has been resolved.

Read the full incident report →

Major July 14, 2025

Investigating connectivity issues

Detected by Pingoru: Jul 14, 2025, 04:53 PM UTC
Resolved: Jul 14, 2025, 06:32 PM UTC
Duration: 1h 38m

Timeline · 4 updates

investigating Jul 14, 2025, 04:53 PM UTC

We are currently experiencing a service disruption across all regions. This is impacting availability and performance for some customers using our services. Our engineering teams are working to identify the issue and are actively working to mitigate the impact. We will provide updates here every 30 minutes or as new information becomes available.
identified Jul 14, 2025, 05:47 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Jul 14, 2025, 06:31 PM UTC

The problem has been resolved, and all of our applications are now working properly. We are constantly monitoring the system.
resolved Jul 14, 2025, 06:32 PM UTC

This incident has been resolved.

Read the full incident report →

Minor June 4, 2025

We are experiencing connectivity issue on the US Cluster

Detected by Pingoru: Jun 04, 2025, 08:20 AM UTC
Resolved: Jun 04, 2025, 11:31 AM UTC
Duration: 3h 11m

Affected: Americas (US-east)

Timeline · 4 updates

investigating Jun 04, 2025, 08:20 AM UTC

We are currently experiencing a service disruption in the us-east1 region. This is impacting availability and performance for some customers using our services hosted in this region. Our engineering teams are working to identify the issue and are actively working to mitigate the impact. We will provide updates here every 30 minutes or as new information becomes available.
identified Jun 04, 2025, 08:53 AM UTC

Our investigation revealed the issue originated from underlying infrastructure limitations. We have scaled up resources in the affected region and are monitoring for stability.
monitoring Jun 04, 2025, 08:56 AM UTC

All impacted customers are back online. Our team is actively monitoring the systems to ensure everything remains stable and performs as expected.
resolved Jun 04, 2025, 11:31 AM UTC

This incident has been resolved.

Read the full incident report →

Minor May 5, 2025

Alation Cloud Service - DEV unavailable for some customers in US region

Detected by Pingoru: May 05, 2025, 05:30 AM UTC
Resolved: May 08, 2025, 08:36 AM UTC
Duration: 3d 3h

Affected: Americas (US-east) - DevAmericas (US-west) - Dev

Timeline · 13 updates

Read the full incident report →

Major November 27, 2024

Alation Cloud Service unavailable for some customers in ap-southeast-2 region

Detected by Pingoru: Nov 27, 2024, 01:00 AM UTC
Resolved: Nov 27, 2024, 01:45 AM UTC
Duration: 45m

Affected: APAC (Sydney)

Timeline · 3 updates

investigating Nov 27, 2024, 01:00 AM UTC

ACS is unavailable for some customers in Sydney region. The issue has been identified and engineers are working on remediation.
monitoring Nov 27, 2024, 01:23 AM UTC

Issue has been identified and a fix has been implemented. We are monitoring the results.
resolved Nov 27, 2024, 05:51 AM UTC

The incident that was affecting the ACS service has been resolved.

Read the full incident report →

Minor October 31, 2024

Long Running MDE/QLI jobs failing in US-WEST-2

Detected by Pingoru: Oct 31, 2024, 10:26 PM UTC
Resolved: Nov 02, 2024, 07:18 AM UTC
Duration: 1d 8h

Affected: Americas (US-west)

Timeline · 2 updates

investigating Oct 31, 2024, 10:26 PM UTC

We are currently investigating an issue which impacts long running QLI/MDE jobs in the ACS US-WEST.
resolved Nov 02, 2024, 07:18 AM UTC

This incident has been resolved. Infrastructure maintenance has been completed.

Read the full incident report →

Minor October 29, 2024

Metadata Extraction failure with read timeout to airflow cluster - Elevated Error

Detected by Pingoru: Oct 29, 2024, 11:30 PM UTC
Resolved: Oct 31, 2024, 06:38 PM UTC
Duration: 1d 19h

Affected: Americas (US-east)Americas (US-west)Canada (Montreal)EMEA (Ireland)EMEA (Frankfurt)APAC (Sydney)APAC (Singapore)APAC (Tokyo)

Timeline · 4 updates

investigating Oct 30, 2024, 06:39 PM UTC

We are currently investigating an issue with the MDE Pipeline service, which is preventing data extraction and causing errors. The error is related to a timeout connection to the pipeline service. Our team is working to resolve the issue as quickly as possible. We will keep you posted with the progress as it becomes available.
investigating Oct 30, 2024, 08:17 PM UTC

The issue is impacting US-east region only. All other regions are fully operational. Following error message may be seen in impacted region. "HTTPConnectionPool(host='airflow-pipeline-service.default.svc.cluster.local', port=80): Read timed out. (read timeout=1800) "
monitoring Oct 31, 2024, 03:40 AM UTC

Our engineering team has successfully resolved the issue causing the timeout connection to the Airflow pipeline service, and the system is now functioning as expected.
resolved Oct 31, 2024, 06:38 PM UTC

The incident had been resolved and we have not seen the error reoccur during our monitoring period.

Read the full incident report →

Critical October 28, 2024

Connectivity issue in us-west-2

Detected by Pingoru: Oct 28, 2024, 11:56 AM UTC
Resolved: Oct 28, 2024, 01:15 PM UTC
Duration: 1h 19m

Affected: Americas (US-west)Americas (US-west) - Dev

Timeline · 4 updates

investigating Oct 28, 2024, 11:56 AM UTC

Connectivity issue affecting the US-West-2 region. We are working to resolve it as quickly as possible and will provide updates shortly.
identified Oct 28, 2024, 12:08 PM UTC

The issue has been identified and a fix is being implemented.
monitoring Oct 28, 2024, 12:26 PM UTC

A fix has been implemented and we are monitoring the results.
resolved Oct 28, 2024, 01:15 PM UTC

This incident has been resolved.

Read the full incident report →

Minor September 9, 2024

Scheduled Jobs - Error

Detected by Pingoru: Sep 09, 2024, 04:54 AM UTC
Resolved: Sep 09, 2024, 06:12 AM UTC
Duration: 1h 17m

Affected: EMEA (Ireland)EMEA (Frankfurt)APAC (Sydney)APAC (Singapore)APAC (Tokyo)

Timeline · 3 updates

investigating Sep 09, 2024, 04:54 AM UTC

We are currently experiencing issues with our scheduled jobs, affecting some of our customers. Our team is actively investigating the cause and we will keep you update with the progress.
identified Sep 09, 2024, 05:42 AM UTC

The issue causing elevated errors with the scheduled jobs has been identified, and our internal team is actively working on a fix.
resolved Sep 09, 2024, 06:12 AM UTC

Our team has successfully identified and rectified the root cause of the issue affecting scheduled jobs, and is now actively monitoring the situation to prevent any future occurrences. Specifically, we have determined that the issue only impacted customers in the APAC (Sydney) region.

Read the full incident report →

Major July 31, 2024

Disruption Affecting Alation POV Cluster

Detected by Pingoru: Jul 31, 2024, 12:06 PM UTC
Resolved: Jul 31, 2024, 04:08 PM UTC
Duration: 4h 1m

Affected: PoV (Proof of Value)

Timeline · 4 updates

identified Jul 31, 2024, 02:06 PM UTC

Users may encounter intermittent performance issues or temporary unavailability when accessing the Alation POV Instance. We are actively working to address the issue and will keep you updated on the progress.
identified Jul 31, 2024, 02:19 PM UTC

We are still actively working to restore service; further updates will be provided as they become available.
identified Jul 31, 2024, 03:48 PM UTC

Our team is still actively engaged in the incident, and we will keep you posted once we have further information.
resolved Jul 31, 2024, 04:08 PM UTC

The intermittent issue with the POV cluster has been resolved, and we are actively monitoring to ensure the service's stability.

Read the full incident report →

Notice June 30, 2024

Limited Impact - Upgrade Issue - US East Cluster

Detected by Pingoru: Jun 30, 2024, 09:00 AM UTC
Resolved: Jun 30, 2024, 12:50 PM UTC
Duration: 3h 49m

Affected: Americas (US-east)

Timeline · 2 updates

identified Jun 30, 2024, 10:21 AM UTC

We're experiencing an upgrade issue affecting a small number of users in the US-east cluster. Our team is actively working to resolve the problem and restore full functionality. We'll continue to provide updates on our progress here.
resolved Jun 30, 2024, 12:52 PM UTC

The upgrade issue that effected a small set of customer has been resolved.

Read the full incident report →