Okta experienced a minor incident on January 15, 2025 affecting Workflows, lasting 8d 11h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- resolved Jan 15, 2025, 07:06 PM UTC
The Okta Workflows team became aware of missing telemetry affecting customers on OK1, OK2, OK3, OK4, OK6, OK7, and OK11 between 6pm and 1am PT on January 14th-15th, 2025. During this time, your flows continued to process properly. It may appear that flows did not execute during this timeframe. Telemetry processing has since been restored. Root cause information: We sincerely apologize for any impact this incident has caused to you, your business, or your customers. At Okta, trust and transparency are our top priorities. Outlined below are the facts regarding this incident. We are committed to implementing improvements to the service to prevent future occurrences of this incident. Detection and Impact On January 14th at 4:10 pm PST, Okta was alerted to an issue where 8% of Workflows execution history messages were unable to be processed in Cells OK1, OK2, OK3, OK4, OK6, OK7, OK8, OK11. This could result in the appearance that workflow executions were stalled or lost. Workflows continued to execute properly. Root Cause Summary Based on our investigation and findings, the root cause of this issue was due to a configuration error within services managed by a 3rd party provider during a maintenance window. Remediation Steps Immediately upon receiving alerts of network disruptions, Okta Engineering escalated the issues with our provider and worked to implement internal mitigations. Okta worked directly with our provider to mitigate the issue and confirmed full service restoration. Okta's internal mitigations restored full service to the affected cells by 5:50 pm PST on January 14th. Preventative Actions Okta will continue working with our third-party service provider to enhance monitoring and expedite the detection of georegional issues with the queuing system. Additionally, we have updated our operational processes to further improve service recovery times. In parallel, we are rolling out a new, more resilient message delivery system for flow execution history data, reducing our dependency on this specific service. Affected cells: okta.com:1, okta.com:2, okta.com:3, okta.com:4, okta.com:6, okta.com:7, okta.com:11