JumpCloud incident

Increased Error Rates Affecting Multiple Platform Services

Major Resolved View vendor source →

JumpCloud experienced a major incident on November 4, 2025 affecting LDAP and RADIUS and 1 more component, lasting 2h 38m. The incident has been resolved; the full update timeline is below.

Started
Nov 04, 2025, 11:40 AM UTC
Resolved
Nov 04, 2025, 02:19 PM UTC
Duration
2h 38m
Detected by Pingoru
Nov 04, 2025, 11:40 AM UTC

Affected components

LDAPRADIUSUser ConsoleTOTP / MFA / JumpCloud ProtectSSOAdmin Console

Update timeline

  1. investigating Nov 04, 2025, 11:40 AM UTC

    We are seeing issues with SSO authentication. We are investigating this currently and will update within 1 hour

  2. investigating Nov 04, 2025, 11:41 AM UTC

    We are continuing to investigate this issue.

  3. investigating Nov 04, 2025, 11:47 AM UTC

    We are continuing to investigate this issue.

  4. identified Nov 04, 2025, 12:20 PM UTC

    We have identified an issue that is causing intermittent login issues with the JumpCloud User Portal and Admin Portal. We have also identified issues with JumpCloud MFA, LDAP, RADIUS, and authentication with SSO. We are working on implementing a fix and will provide another update as soon as possible.

  5. identified Nov 04, 2025, 12:59 PM UTC

    We continue to see intermittent issues with accessing the JumpCloud User and Admin Portal, MFA, LDAP, RADIUS, and SSO. During this time access attempts to LDAP and RADIUS are only impacted if a user is authenticating with MFA. Our team is working on implementing a fix and will provide another update as quickly as possible.

  6. monitoring Nov 04, 2025, 01:17 PM UTC

    We have implemented a fix and users should now be able to access the User and Admin Console, MFA, LDAP, RADIUS, and SSO without issue. We will continue to monitor the results of the fix.

  7. resolved Nov 04, 2025, 02:19 PM UTC

    Services have been fully restored and this incident has been resolved. We will provide a formal postmortem as a follow up.

  8. postmortem Nov 07, 2025, 05:47 PM UTC

    ![](https://jumpcloud.com/wp-content/themes/jumpcloud/assets/images/jumpcloud-press-kit/logos/02-jc-richblack-tm.png) ‌ **Date**: Nov 7, 2025 **Date of Incident:** Nov 4, 2025 **Description**: RCA for Auth Database Degradation ‌ **Summary:** On November 4, 2025, a number of customers experienced intermittent failures, timeouts and increased latency when attempting to authenticate to multiple JumpCloud Services, including consoles, LDAP, RADIUS and SAML, or use Multi-Factor Authentication. ‌ **Root Cause:** The incident was triggered by an issue in the deployment process involving a database schema change and a subsequent application code release. During this deployment, a planned database change unintentionally removed several database indexes required by the existing application code. The sequence of failure was as follows: 1. Deployment Order Error: The database schema change \(which removed necessary indexes\) was applied to the production database before the new application code \(which did not require those indexes\) was deployed. 2. Performance Collapse: The existing, high-volume authentication code \(used for functions like TOTP and push authentication\) was forced to run against the now-inefficient database structure. Queries that normally took milliseconds suddenly took several seconds. 3. Connection Exhaustion: These slow queries held database connections open for extended periods, quickly overwhelming the database server's available connection pool. 4. Full Outage: With no available connections, the main authentication API could not communicate with the database, leading to 100% CPU utilization on the database server and triggering the intermittent timeouts and failures experienced by our customers. ‌ **Why Testing Did Not Catch This:** The issue was not identified during testing in our Development or Staging environments due to insufficient Load Simulation. The resource consumption issues and connection exhaustion only manifest under the extreme pressure of peak production traffic volume. The simulated load profiles in our lower environments were not sufficient to expose this specific failure mode. ‌ **Corrective Actions / Risk Mitigation:** 1. Mandatory schema change review - All database schema changes must now undergo an additional level of review to explicitly assess index dependencies and impact. 2. New deployment phasing - We are implementing new tools and checks to enforce that application code dependent on a schema change is deployed before a database change is executed. 3. Enhance alerting - We are implementing new monitors and alerts specifically for the Auth-API's database connection pool health and CPU utilization. 4. Enhanced load testing - We are revisiting the load profiles used in our staging environments looking for opportunities to more accurately simulate peak production traffic.