Okta incident

OPA Degradation

Major Resolved View vendor source →

Okta experienced a major incident on May 8, 2024 affecting Privileged Access, lasting 6d 9h. The incident has been resolved; the full update timeline is below.

Started
May 08, 2024, 04:53 PM UTC
Resolved
May 15, 2024, 01:56 AM UTC
Duration
6d 9h
Detected by Pingoru
May 08, 2024, 04:53 PM UTC

Affected components

Privileged Access

Update timeline

  1. resolved May 08, 2024, 04:53 PM UTC

    From approximately 8:06AM to 10:03AM on May 8, 2024, our engineering team became aware of an issue impacting Okta Privilege Access (OPA) affecting all cells. During this time, customers may experience issues accessing OPA console and may receive an HTTP 50x & 401 response codes. The engineering team has reverted the configurational changes made in OPA to mitigate the issue. Additional root cause information will be available within 5 Business days. Affected cells: okta-emea.com:1, okta.com:1, okta.com:2, okta.com:3, okta.com:4, okta.com:6, okta.com:7, okta.com:8, okta.com:9, okta.com:11, okta.com:12, okta.com:15, okta.com:14, okta.com:16, okta.com:17

  2. resolved May 15, 2024, 01:55 AM UTC

    We sincerely apologize for any impact this incident has caused to you, your business, and your customers. At Okta trust and transparency are two of our top priorities. Outlined below are the facts regarding this incident. We are committed to implementing improvements to the service to prevent future occurrences of this incident. Detection and Impact: On May 8th at 8:03am PT, the Okta monitoring system alerted our team of customer errors in accessing the Okta Privileged Access (OPA) service in US Cell 2 and EU Cell 3. At the time, customers who were attempting to access OPA teams received an error message and were unable to access the service. Root Cause Summary: The issue was a result of a configuration change, incorrectly applied to all OPA cells, causing the OPA service to be inaccessible to customers when attempting to access OPA teams. Remediation Steps: Upon receiving alerts, Okta began diagnosing the incident and discovered that the issue was due to a configuration change made to a library used by OPA to manage how we enable features. The team rolled back the change and normal processing resumed at 10:05am PT. A second wave was triggered at 2:07pm PT when our build system accidentally pushed the flawed configuration to all cells a second time. The team immediately diagnosed the issue, again rolled back the configuration change, and implemented preventative measures in place by removing the automation so that the issue was not triggered again. Preventative Actions: To ensure this does not happen again, Okta is enhancing our testing and promotion procedures to avoid future issues with library configuration changes. We are working to guarantee changes to deployable artifacts are further vetted before their release into our production environments, and once deployed, are immutable from further change. Total Duration Wave1: Start Time: 8:03am PT End Time: 10:05am PT Duration: 122 minutes