Cloudera incident
FreeIPA is unable to reach the US West Control Plane
Cloudera experienced a major incident on June 5, 2024 affecting Cloudera Management Console, lasting 1h 37m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Jun 05, 2024, 05:44 AM UTC
Investigating: Current Status: We are currently investigating a potential issue with FreeIPA service. We will have an update within 60 mins. Customer Experience: Customer may observe issues while performing control operations on environments in the US West. Incident Start time: 05:00 UTC June 5th, 2024
- investigating Jun 05, 2024, 06:28 AM UTC
We are continuing to investigate this issue.
- identified Jun 05, 2024, 06:33 AM UTC
Current Status: Our teams have identified the source of the issue. We are working on developing and implementing a solution to restore the service(s). We will have another update within 60 mins. Customer Experience: During this window, Customer may observe issues while performing control operations on environments in the US West. Incident Start time: 05:00 UTC June 5th, 2024
- monitoring Jun 05, 2024, 07:03 AM UTC
A fix has been implemented and we are monitoring the results.
- resolved Jun 05, 2024, 07:22 AM UTC
Our teams have successfully deployed a fix for the issue and confirmed that the issue has been resolved. If you are still experiencing issues or have any questions please raise a support case with us. A root cause analysis (RCA) will be published within seven business days. Customer Experience: Customer may observe issues when accessing control plane for US West. Incident Start time: 01:00 UTC June 5th, 2024 Incident End time: 07:07 UTC June 5th, 2024
- postmortem Jun 12, 2024, 06:05 PM UTC
On June 5, 2024, our FreeIPA service experienced connectivity issues to our Control Plane service. Subsequent investigation identified the root cause stemmed from recent ingress changes implemented the day prior. In response, corrective measures were promptly taken to address the issue. Furthermore, we have implemented supplementary checks and preemptive actions to prevent similar occurrences in the future. We apologise for any inconvenience caused by the service disruption. We are fully committed to providing a reliable and robust platform and truly appreciate your understanding.