Cloudera incident

Intermittent FreeIPA Service disruption in US Control Plane.

Cloudera experienced a notice incident on April 3, 2025 affecting Cloudera Management Console and Cloudera Data Flow and 1 more component, lasting 3h 44m. The incident has been resolved; the full update timeline is below.

Started: Apr 03, 2025, 11:41 AM UTC
Resolved: Apr 03, 2025, 03:25 PM UTC
Duration: 3h 44m
Detected by Pingoru: Apr 03, 2025, 11:41 AM UTC

Affected components

Cloudera Management ConsoleCloudera Data FlowCloudera Data EngineeringCloudera Data WarehouseCloudera Operational DatabaseCloudera AICloudera Data HubCloudera Data Catalog

Update timeline

investigating Apr 03, 2025, 11:41 AM UTC

Current Status: We are currently investigating a potential issue with US control plane service. We will have an update within 60 mins. Customer Experience: Login and CDP operations may be impacted.Running workload on clusters should not be impacted. Incident Start time: 6:30 UTC April 3rd, 2025
investigating Apr 03, 2025, 12:42 PM UTC

Current Status: Our teams are continuing their investigation to determine the source of the issue. We will have another update within 60 mins. Customer Experience: Login and CDP operations may be impacted.Running workload on clusters should not be impacted. Incident Start time: 6:30 UTC April 3rd, 2025.
investigating Apr 03, 2025, 01:47 PM UTC

Current Status: Our teams are continuing their investigation to determine the source of the issue. We have ruled out that its a Workload issue. We will have another update within 60 mins. Customer Experience: Login and CDP operations may be impacted. Incident Start time: 6:30 UTC April 3rd, 2025.
resolved Apr 03, 2025, 03:25 PM UTC

Current Status: Our teams have successfully deployed a fix for the issue and confirmed that the issue has been resolved. If you are still experiencing issues or have any questions please raise a support case with us. A root cause analysis (RCA) will be published within seven business days. Customer Experience: Login and CDP operations via portal was impacted. API & CLI operations were working fine. Also workloads were not impacted. Incident Start time: 6:30 UTC April 3rd, 2025. Incident End time: 15:15 UTC April 3rd, 2025.
postmortem Apr 15, 2025, 07:49 PM UTC

On 3rd April 2025, there was an intermittent service disruption impacting login and CLI operations via the Management Console portal. Subsequent investigation identified the root cause stemmed from an increase in latency with internal DNS servers. In response, corrective measures were promptly taken to address the issue. Furthermore, we have implemented supplementary checks and preemptive actions to prevent similar occurrences in the future. We apologise for any inconvenience caused by the service disruption. We are fully committed to providing a reliable and robust platform and truly appreciate your understanding.