Centrify incident

Delinea Cloud Suite Pod34 Service Disruption

Centrify experienced a critical incident on August 7, 2025 affecting Privileged Access Service / Cloud Suite, lasting 2h 29m. The incident has been resolved; the full update timeline is below.

Started: Aug 07, 2025, 12:39 PM UTC
Resolved: Aug 07, 2025, 03:08 PM UTC
Duration: 2h 29m
Detected by Pingoru: Aug 07, 2025, 12:39 PM UTC

Affected components

Privileged Access Service / Cloud Suite

Update timeline

investigating Aug 07, 2025, 12:39 PM UTC

We are currently experiencing a service disruption on Cloud Suite Pod34. We are seeing elevated connection counts and locking occuring on the database. As a result, we are going restart the database within the next few minutes. This will result in a brief complete outage. We apologize for any inconvenience this may cause and appreciate your patience as we work to restore normal service. For any questions or concerns, please reach out to our support team at https://support.delinea.com.
investigating Aug 07, 2025, 12:54 PM UTC

We are continuing to investigate this issue.
investigating Aug 07, 2025, 12:54 PM UTC

We are continuing to investigate this issue.
monitoring Aug 07, 2025, 01:32 PM UTC

We applied the mitigation steps to restore the service availability and we are monitoring the results.
resolved Aug 07, 2025, 03:08 PM UTC

This incident has been resolved.
postmortem Aug 26, 2025, 03:58 AM UTC

### Incident Overview On August 7, 2025, customers hosted on Pod 34 experienced a service outage. The pod became intermittently unresponsive, leading to delays and request failures for some customers. Other pods and services were unaffected. Service availability was restored by 9:10 AM EDT on August 7, 2025. ### Timeline of Events * 04:19 AM EDT – Operations team received alerts of degraded performance on Pod 34; on-call engineer began investigation. * 04:43 AM EDT – Initiated rolling reimage of Pod 34 web servers. * 07:55 AM EDT – Restarted Pod 34 database instance. * 09:10 AM EDT – Pod 34 stabilized and service availability returned to normal levels. ### Root Cause The disruption occurred when the database on Pod 34 reached its maximum allowed connections, preventing new connections and causing request timeouts. A high volume of requests from one tenant triggered this condition, resulting in performance degradation for other tenants on the same pod \(a “noisy neighbor” scenario\). ### Resolution Active database connections holding transaction locks were terminated, allowing new connections to be established and restoring normal service operation. ### Preventive Actions * Delinea’s team is engaging directly with the tenant whose activity triggered the high connection volume. * We are working on migrating select tenants on Pod 34 to dedicated pods to reduce resource contention. * We are testing two permanent mitigation strategies: * **Server-side**: Cache negative UID lookups to reduce the need for new database connections. * **Agent-side**: Debounce redundant and concurrent lookup attempts.