WorkOS incident

Some API endpoints were erroneously rate-limited

Major Resolved View vendor source →

WorkOS experienced a major incident on July 23, 2025 affecting Directory Sync and Audit Logs and 1 more component, lasting 7m. The incident has been resolved; the full update timeline is below.

Started
Jul 23, 2025, 11:09 PM UTC
Resolved
Jul 23, 2025, 11:17 PM UTC
Duration
7m
Detected by Pingoru
Jul 23, 2025, 11:09 PM UTC

Affected components

Directory SyncAudit LogsAuthKit

Update timeline

  1. identified Jul 23, 2025, 11:09 PM UTC

    Between 5:50 PM and 10:21 PM UTC, certain API endpoints across AuthKit, Audit Logs, Directory Sync, and the Events API were rate-limited more aggressively than intended. We’ve identified the root cause, deployed a fix, and are currently monitoring the system.

  2. monitoring Jul 23, 2025, 11:10 PM UTC

    We’ve rolled out the fix and are continuing to monitor for errors.

  3. resolved Jul 23, 2025, 11:17 PM UTC

    All services are operational and the incident has been resolved.

  4. postmortem Jul 23, 2025, 11:45 PM UTC

    ## Summary On Wednesday, July 23, between 17:50 and 22:21 UTC, several WorkOS API endpoints applied rate limits more aggressively than intended for some customers. The affected services were AuthKit, Directory Sync, Audit Logs, and the Events API. Our published limits \(see [workos.com/docs/reference/rate-limits](http://workos.com/docs/reference/rate-limits)\) remained unchanged, but a recent code change introduced a bug in the rate‑limiting service that prematurely returned HTTP 429 responses. Once identified, we rolled back the change and restored normal operation. We understand the seriousness of this disruption and remain committed to delivering the highest level of reliability across our platform. ## Root Cause Analysis WorkOS enforces rate limits on API endpoints across several products \(AuthKit, Audit Logs, Directory Sync, and the Events API\) to ensure reliable and predictable uptime. We apply rate limiting at multiple layers of our infrastructure. A recent code change introduced a bug in the application‑layer rate‑limiting service, which throttled certain traffic more aggressively than documented. The deployment process did not catch this bug because the enforcement logic executed before the logging component captured and recorded the request. ## Remediation Rate limits operate on the critical path of our services. In response to this incident, we are strengthening our testing, deployment, and observability safeguards across all public API endpoints. This includes capturing observability logs before any other endpoint code executes.