Okta experienced a major incident on May 8, 2024 affecting Privileged Access, lasting 22d 16h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- resolved May 08, 2024, 09:34 PM UTC
From approximately 2:15pm PDT to 2:35pm PDT on May 8, 2024, our engineering team became aware of an issue impacting Okta Privilege Access (OPA) affecting all cells. During this time, customers may experience issues accessing OPA console and may receive an HTTP 50x & 401 response codes. The engineering team has reverted the configurational changes made in OPA to mitigate the issue. Affected cells: okta-emea.com:1, okta.com:1, okta.com:2, okta.com:3, okta.com:4, okta.com:6, okta.com:7, oktapreview.com:1, oktapreview.com:2, okta.com:8, okta.com:9, okta.com:11, okta.com:12, oktapreview.com:3, okta.com:14, okta.com:16, okta.com:17
- resolved May 08, 2024, 09:52 PM UTC
From approximately 2:15pm PDT to 2:35pm PDT on May 8, 2024, our engineering team became aware of an issue impacting Okta Privilege Access (OPA) affecting all cells. During this time, customers may experience issues accessing OPA console and may receive an HTTP 50x & 401 response codes. The engineering team has reverted the configuration changes. Additional root cause information will be available within 5 Business days.
- resolved May 15, 2024, 01:57 AM UTC
We sincerely apologize for any impact this incident has caused to you, your business, and your customers. At Okta trust and transparency are two of our top priorities. Outlined below are the facts regarding this incident. We are committed to implementing improvements to the service to prevent future occurrences of this incident. Detection and Impact: On May 8th at 8:03am PT, the Okta monitoring system alerted our team of customer errors in accessing the Okta Privileged Access (OPA) service in US Cell 2 and EU Cell 3. At the time, customers who were attempting to access OPA teams received an error message and were unable to access the service. Root Cause Summary: The issue was a result of a configuration change, incorrectly applied to all OPA cells, causing the OPA service to be inaccessible to customers when attempting to access OPA teams. Remediation Steps: Upon receiving alerts, Okta began diagnosing the incident and discovered that the issue was due to a configuration change made to a library used by OPA to manage how we enable features. The team rolled back the change and normal processing resumed at 10:05am PT. A second wave was triggered at 2:07pm PT when our build system accidentally pushed the flawed configuration to all cells a second time. The team immediately diagnosed the issue, again rolled back the configuration change, and implemented preventative measures in place by removing the automation so that the issue was not triggered again. Preventative Actions: To ensure this does not happen again, Okta is enhancing our testing and promotion procedures to avoid future issues with library configuration changes. We are working to guarantee changes to deployable artifacts are further vetted before their release into our production environments, and once deployed, are immutable from further change. Total Duration Total Duration (Minutes): Wave2: Start Time: 2:07pm PT End Time: 2:35pm PT Duration: 28 minutes