Cloudera incident

Cloudera AI Inference and Model registry Creation Failures

Major Resolved View vendor source →

Cloudera experienced a major incident on June 28, 2024 affecting Cloudera AI and Cloudera AI and 1 more component, lasting 9h 24m. The incident has been resolved; the full update timeline is below.

Started
Jun 28, 2024, 02:25 AM UTC
Resolved
Jun 28, 2024, 11:50 AM UTC
Duration
9h 24m
Detected by Pingoru
Jun 28, 2024, 02:25 AM UTC

Affected components

Cloudera AICloudera AICloudera AI

Update timeline

  1. investigating Jun 28, 2024, 02:25 AM UTC

    Current Status: We are currently investigating a potential issue with Cloudera AI Inference and Model registry. We will have an update within 60 mins. Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity. Incident Start time: 5 PM UTC June 25th, 2024

  2. identified Jun 28, 2024, 03:27 AM UTC

    Current Status: Our teams have identified the source of the issue. We are working on developing and implementing a solution to restore the service(s). We will have another update within 60 mins. Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity. Incident Start time: 5 PM UTC June 25th, 2024

  3. identified Jun 28, 2024, 04:26 AM UTC

    Current Status: Our teams have identified and fixed the issue in US region. We are still investigating the issue in AP and EU regions. We will have another update within 60 mins. Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity. Incident Start time: 5 PM UTC June 25th, 2024

  4. identified Jun 28, 2024, 05:34 AM UTC

    Current Status: We continue investigating the issue in AP and EU regions. US region is functioning as expected. We will have another update within 60 mins. Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity. Incident Start time: 5 PM UTC June 25th, 2024

  5. resolved Jun 28, 2024, 11:50 AM UTC

    Current Status: Our teams have successfully deployed a fix for the issue and confirmed that the issue has been resolved. If you are still experiencing issues or have any questions please raise a support case with us. A root cause analysis (RCA) will be published within seven business days. Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity. Incident Start time: 5 PM UTC June 25th, 2024 Incident End time: 5:34 AM UTC June 28th, 2024

  6. postmortem Jul 16, 2024, 06:08 AM UTC

    On June 27, 2024, Our internal monitoring detected an issue with new deployments and upgrades of the Cloudera AI Inference and Model registry on our US control plane. Further investigation revealed configuration issues between our internal micro-services, the changes were made to address the issue and corrective actions have been implemented. We apologize for any inconvenience caused by the service disruption. We are fully committed to providing a reliable and robust platform and truly appreciate your understanding.