Cloudera incident
Clusters on the US Control Plane in an unreachable state
Cloudera experienced a critical incident on November 10, 2023 affecting Cloudera Management Console, lasting 52m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Nov 10, 2023, 09:13 PM UTC
Current Status: We are currently investigating an issue with our US Control Plane. We will have another update within 60 mins. Customer Experience: Customers on the US control plan may observe DataLake & DataHub cluster displaying an unreachable state Incident Start time: ~20:00 UTC November 10th, 2023
- investigating Nov 10, 2023, 10:04 PM UTC
Current Status: Our teams have successfully deployed a fix for the issue and confirmed that the issue has been resolved. If you are still experiencing issues or have any questions please raise a support case with us. A root cause analysis (RCA) will be published within 7 Business days. Customer Experience: Customers on the US control plan may observe DataLake & DataHub cluster displaying an unreachable state Incident Start time: ~20:00 UTC November 10th, 2023 Incident End time: ~21:40 UTC November 10th, 2023
- resolved Nov 10, 2023, 10:06 PM UTC
This incident has been resolved.
- postmortem Nov 17, 2023, 09:08 PM UTC
The root cause of the issue was the expiration of certificates on our Cluster Connectivity Manager \(CCM\) instances. The certificates were renewed earlier this year however did not get fully installed across all services which caused the breakdown of communication between the services. To address this, we have enhanced our monitoring and devised a comprehensive remediation plan to detect and address similar incidents ahead of time. We regret the inconvenience this may have caused and strive to ensure every measure is taken to avoid similar issues in the future.