Alation Cloud Service incident

Metadata Extraction failure with read timeout to airflow cluster - Elevated Error

Minor Resolved View vendor source →

Alation Cloud Service experienced a minor incident on October 29, 2024 affecting Americas (US-east) and Americas (US-west) and 1 more component, lasting 1d 19h. The incident has been resolved; the full update timeline is below.

Started
Oct 29, 2024, 11:30 PM UTC
Resolved
Oct 31, 2024, 06:38 PM UTC
Duration
1d 19h
Detected by Pingoru
Oct 29, 2024, 11:30 PM UTC

Affected components

Americas (US-east)Americas (US-west)Canada (Montreal)EMEA (Ireland)EMEA (Frankfurt)APAC (Sydney)APAC (Singapore)APAC (Tokyo)

Update timeline

  1. investigating Oct 30, 2024, 06:39 PM UTC

    We are currently investigating an issue with the MDE Pipeline service, which is preventing data extraction and causing errors. The error is related to a timeout connection to the pipeline service. Our team is working to resolve the issue as quickly as possible. We will keep you posted with the progress as it becomes available.

  2. investigating Oct 30, 2024, 08:17 PM UTC

    The issue is impacting US-east region only. All other regions are fully operational. Following error message may be seen in impacted region. "HTTPConnectionPool(host='airflow-pipeline-service.default.svc.cluster.local', port=80): Read timed out. (read timeout=1800) "

  3. monitoring Oct 31, 2024, 03:40 AM UTC

    Our engineering team has successfully resolved the issue causing the timeout connection to the Airflow pipeline service, and the system is now functioning as expected.

  4. resolved Oct 31, 2024, 06:38 PM UTC

    The incident had been resolved and we have not seen the error reoccur during our monitoring period.