WorkOS incident

Elevated Platform Error Rates

Critical Resolved View vendor source →

WorkOS experienced a critical incident on July 16, 2025 affecting Dashboard and SSO and 1 more component, lasting 1h 54m. The incident has been resolved; the full update timeline is below.

Started
Jul 16, 2025, 08:51 PM UTC
Resolved
Jul 16, 2025, 10:45 PM UTC
Duration
1h 54m
Detected by Pingoru
Jul 16, 2025, 08:51 PM UTC

Affected components

DashboardSSOAdmin PortalDirectory SyncAuthKit

Update timeline

  1. investigating Jul 16, 2025, 08:51 PM UTC

    We are seeing elevated error rates in AuthKit. Our team is investigating the issue.

  2. investigating Jul 16, 2025, 09:00 PM UTC

    We are seeing elevated error rates across our platform. Our team is investigating the issue.

  3. identified Jul 16, 2025, 09:08 PM UTC

    Our team has identified the issue and is rolling out the fix.

  4. monitoring Jul 16, 2025, 09:17 PM UTC

    We’ve rolled out the fix and are continuing to monitor for errors.

  5. resolved Jul 16, 2025, 10:45 PM UTC

    All services are operational and the incident has been resolved.

  6. postmortem Jul 16, 2025, 11:37 PM UTC

    ## Summary On Wednesday, July 16th, from 20:38 to 21:06 UTC, several WorkOS services were unavailable for some customers. Impacted services included AuthKit, Dashboard, Admin Portal, SSO, Directory Sync, and their APIs. The outage lasted 28 minutes. Customers can opt to use custom domains to serve WorkOS products as part of a branded experience. The incident occurred after a code change introduced a bug in the edge worker responsible for custom domain routing. We identified the issue and rolled back the affected edge worker deployment. We recognize the seriousness of this outage and are committed to maintaining the highest level of reliability across our platform. ## Root Cause Analysis WorkOS supports custom domains across several products, including AuthKit, SSO, Admin Portal, and our APIs, to enable customers to deliver a branded experience. We use edge workers to handle routing logic for these domains. A recent code change introduced a bug in the edge worker that caused it to throw errors during request handling. This bug was not caught by our test suite due to environment differences between the V8 runtime used by our edge workers and the Node.js environment used in tests. The test suite did not replicate the behavior of the V8 runtime closely enough to surface this issue. Differences in how custom domains are configured in our pre-production environments prevented automated health checks from detecting the issue. ## Remediation Edge workers operate in the critical path of our services. In response to this incident, we are taking immediate steps to strengthen our testing and deployment safeguards for edge infrastructure. This includes adding automated health checks for custom domain routes in pre-production and tightening our rollout procedures for edge worker changes.