Okta incident

Directory agents connection issues on OK14

Minor Resolved View vendor source →

Okta experienced a minor incident on August 13, 2024 affecting okta.com cell 14 and Core Platform, lasting 392d 10h. The incident has been resolved; the full update timeline is below.

Started
Aug 13, 2024, 07:00 AM UTC
Resolved
Sep 09, 2025, 05:42 PM UTC
Duration
392d 10h
Detected by Pingoru
Aug 13, 2024, 07:00 AM UTC

Affected components

okta.com cell 14Core Platform

Update timeline

  1. resolved Aug 13, 2024, 07:00 AM UTC

    At 12:00 AM PST on August 13, 2024, Okta became aware of Directory Agents connectivity issues resulting in 503 and 504 errors and affecting imports and Delauth. This issue has been resolved. Okta took corrective action to resolve the service interruption. Additional root cause information will be available within 5 Business days.

  2. resolved Aug 13, 2024, 10:44 AM UTC

    We continue to investigate the AD agents connection issue on OK14, we will update this message with more information as soon as it becomes available.

  3. resolved Aug 21, 2024, 05:48 AM UTC

    On August 13th at 12:02 am PST, Okta was alerted to an anomaly in US cell 14 with some customers using AD, LDAP and Okta On Prem Provisioning (OPP) agents. A small set of customers experienced intermittent Okta agent connectivity issues, timeouts, and slow authentication response times, and may have received 401 and 500 error codes. Impact: Delegated Authentication users would have seen intermittent delays and errors logging in, while users who updated their profiles or their passwords might have seen additional errors. Root Cause: During scheduled maintenance of a messaging cluster, the application that retrieves messages from this cluster experienced an unexpected error. This error caused it to flood the messaging cluster with network connections. The cluster subsequently began rejecting new connections and the system could not process agent traffic. Remediation Steps: Immediately upon receiving alerts, the Okta team began diagnosing the issue. The initial focus of the investigation was the messaging cluster, but upon further diagnosis, it became clear that it was the aforementioned application, and restarting this application restored service. Preventative Action: To ensure this issue does not recur, Okta has added capacity to the messaging cluster. Okta is also remediating the software error that caused the connection issue, tuning network connectivity for the messaging cluster, and adding new incident response tooling. Finally, enhanced monitoring with updated runbooks for swifter response have been put in place. Duration (# of minutes): 245