Cloudera incident

EU Control Plane Service Disruption

Critical Resolved View vendor source →

Cloudera experienced a critical incident on March 3, 2025 affecting Cloudera Management Console, lasting 2h 28m. The incident has been resolved; the full update timeline is below.

Started
Mar 03, 2025, 10:12 PM UTC
Resolved
Mar 04, 2025, 12:40 AM UTC
Duration
2h 28m
Detected by Pingoru
Mar 03, 2025, 10:12 PM UTC

Affected components

Cloudera Management Console

Update timeline

  1. identified Mar 03, 2025, 10:12 PM UTC

    Current Status: Our teams have identified the source of the issue and have partially restored the services. We continue to work on a restoring the service and will have another update in the next 60mins. Customer Experience: During this window customers may experience issues logging into the console and potential slowness accessing the experiences. Incident Start time: 19:58 UTC March 3rd, 2025

  2. identified Mar 03, 2025, 11:26 PM UTC

    Current Status: Our teams are actively working on a permanent solution to fully restore the service. Please expect another update in 60 mins. Customer Experience: During this window customers may experience issues logging into the console and potential slowness accessing the experiences.

  3. monitoring Mar 04, 2025, 12:27 AM UTC

    Current Status: Our teams have successfully identified the source of the issue and have implemented a solution, which is currently under monitoring. Should you continue to experience issues logging into the console, we kindly request that you submit a support case to us for further assistance. We will keep you updated once we confirm that the issue is resolved on our end. Customer Experience: During this window customers may experience issues logging into the console and potential slowness accessing some services.

  4. resolved Mar 04, 2025, 12:40 AM UTC

    Current Status: Our teams have successfully deployed a fix for the issue and confirmed that the issue has been resolved. If you are still experiencing issues or have any questions please raise a support case with us. A root cause analysis (RCA) will be published within seven business days. Customer Experience: During this window customers may experience issues logging into the console and potential slowness accessing some services.

  5. postmortem Mar 19, 2025, 04:53 PM UTC

    On March 03, 2025, a service disruption occurred within our EU Control plane. This disruption resulted from a configuration change implemented during a routine production cluster upgrade, which inadvertently triggered a synchronization error, leading to service unavailability. ‌ Upon detection of the issue, our engineering teams promptly allocated additional resources to the affected cluster. This action facilitated the restoration of services to an operational state. The root cause of the disruption was the result of an oversight during the standard upgrade procedure, which resulted in the synchronization failure. ‌ We sincerely apologize for any inconvenience this service disruption may have caused. We have implemented corrective measures and refined our upgrade protocols to mitigate the risk of similar incidents in the future. We are committed to maintaining the highest standards of service reliability and appreciate your understanding.