Cloudera incident

Degraded connectivity issues between Control plane and workload clusters

Minor Resolved View vendor source →

Cloudera experienced a minor incident on January 31, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Jan 31, 2024, 02:40 PM UTC
Resolved
Jan 31, 2024, 10:00 AM UTC
Duration
Detected by Pingoru
Jan 31, 2024, 02:40 PM UTC

Update timeline

  1. resolved Jan 31, 2024, 02:40 PM UTC

    We observed degraded connectivity issues between control plane and workload for a subset of customers. This impacted operations like orchestration, auto-scaling and administration of workloads from control plane. The incident has been resolved now and no action is needed from customers. A root cause analysis (RCA) will be published within seven business days.

  2. postmortem Feb 14, 2024, 03:20 PM UTC

    We recently implemented an upgrade to our internal proxy service, responsible for directing requests to workloads through the Cluster Connectivity Management \(CCM\) system. This update aimed to enhance the identification of healthy routes for traffic flow. However, an unforeseen issue impacted this functionality, causing routing through potentially unhealthy paths for selected configurations. The same was fixed by rolling back the release. It's important to note that this problem did not appear during testing in our lower environments, where rigorous evaluation precedes production deployments. Since the incident, we have promptly implemented additional safeguards to effectively identify and prevent similar issues from affecting production systems in the future. We sincerely apologize for any inconvenience this service disruption may have caused. We remain committed to delivering a reliable and robust platform, and appreciate your understanding.