JumpCloud Outage History

JumpCloud is up right now

There were 4 JumpCloud outages since March 12, 2026 totaling 13h 27m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: http://status.jumpcloud.com

Minor April 28, 2026

Degraded Agent Service on MacOS, Windows and Linux

Detected by Pingoru
Apr 28, 2026, 08:01 AM UTC
Resolved
Apr 28, 2026, 09:55 AM UTC
Duration
1h 53m
Affected: Agent
Timeline · 4 updates
  1. investigating Apr 28, 2026, 08:01 AM UTC

    We are currently aware of reports with agent Installation failing, this is affecting MacOS, Windows and Linux. We are investigating the cause of the issues currently, and will provide an update within one hour.

  2. identified Apr 28, 2026, 08:57 AM UTC

    The issue has been identified and a fix will be implemented soon.

  3. monitoring Apr 28, 2026, 09:00 AM UTC

    A fix has been implemented, and Agent installations should now be functioning as expected. Our team is actively monitoring the situation to ensure continued stability.

  4. resolved Apr 28, 2026, 09:55 AM UTC

    The incident has been resolved.

Read the full incident report →

Minor April 2, 2026

LDAP Directory Processing Delay

Detected by Pingoru
Apr 02, 2026, 05:08 PM UTC
Resolved
Apr 02, 2026, 05:55 PM UTC
Duration
47m
Affected: LDAP
Timeline · 4 updates
  1. investigating Apr 02, 2026, 05:08 PM UTC

    We are currently investigating a delay in LDAP directory synchronizations. While updates are processing, changes to user attributes and passwords may take longer than expected to propagate. The team is actively working to increase processing capacity and accelerate synchronizations.

  2. identified Apr 02, 2026, 05:16 PM UTC

    The issue has been identified and a fix is being implemented.

  3. monitoring Apr 02, 2026, 05:23 PM UTC

    A fix has been implemented and we are monitoring the results.

  4. resolved Apr 02, 2026, 05:55 PM UTC

    This incident has been resolved.

Read the full incident report →

Minor March 31, 2026

Directory Dispatch Delays

Detected by Pingoru
Mar 31, 2026, 12:32 AM UTC
Resolved
Mar 31, 2026, 06:24 AM UTC
Duration
5h 52m
Affected: Admin Console
Timeline · 6 updates
  1. identified Mar 31, 2026, 12:32 AM UTC

    JumpCloud is currently experiencing dispatch delays for core Directory services. Administrators may experience delays updating associations for users, groups, policies, directories, and commands. We have identified the cause of the issue and are actively implementing a fix.

  2. identified Mar 31, 2026, 02:38 AM UTC

    We have made progress in reducing the backlog affecting Directory services and are continuing to work through the remaining queue. Administrators may still experience delays when updating associations for users, groups, policies, directories, and commands. The team is actively implementing additional changes to increase processing capacity and accelerate resolution. We will provide further updates in one hour.

  3. identified Mar 31, 2026, 03:42 AM UTC

    The backlog affecting Directory services has been significantly reduced and continues to decrease. Administrators may still experience delays when updating associations for users, groups, policies, directories, and commands. The team continues to work on additional changes to accelerate processing. We will provide further updates in an hour.

  4. monitoring Mar 31, 2026, 05:00 AM UTC

    The backlog affecting Directory services continues to decrease and we are approaching resolution. Administrators may still experience some delays when updating associations. The team is continuing to monitor, and our next update will be to confirm full resolution.

  5. resolved Mar 31, 2026, 06:24 AM UTC

    This incident has been resolved.

  6. postmortem Apr 07, 2026, 02:56 PM UTC

    ![](https://jumpcloud.com/wp-content/themes/jumpcloud/assets/images/logos/jumpcloud-logo-tm-oceanblue.svg) **Date**: Apr 7, 2026 **Date of Incident:** Mar 30, 2026 **Description**: RCA for Directory Association Processing Delays ‌ **Summary:** Starting March 30th at approximately 15:40 MDT, JumpCloud customers experienced significant delays in directory-related updates. This included latency in password changes, user-to-group associations, and outbound provisioning reflecting in downstream systems. The root cause was identified as a specific code deployment in our Devices service that inadvertently flooded a background processing queue with unpartitioned messages, causing a bottleneck that prevented updates from processing in real-time. The issue was fully resolved by 00:25 MDT on March 31, 2026. ‌ **What Happened:** The incident was caused by a change in how the JumpCloud agent retrieves software application configurations. 1. **Traffic Spike:** The new code shifted the "source of truth" for these configurations to a new database. If a device polled the system and did not find its record in the new database, the code automatically enqueued a "track collect" request to sync the data. 2. **Unexpected Volume:** We anticipated a "lazy backfill" \(where records are created over time\), but underestimated the number of devices that had no existing software bindings. This resulted in an immediate, massive spike of nearly 280,000 messages. 3. **The Bottleneck \(Partitioning\):** Crucially, these specific messages were enqueued without a "Partition ID." In our high-scale FIFO \(First-In-First-Out\) queue architecture, messages without a partition ID are processed one-by-one rather than in parallel. This effectively "serialized" the queue, preventing us from scaling up workers to process the backlog faster and causing the observed latency. ‌ **Resolution and Recovery**: Once the offending code was rolled back, the "tap" was turned off, and no further unpartitioned messages were added to the queue. Because the bottleneck was caused by the lack of partitioning, simply scaling horizontally could not speed up the processing of the existing backlog. The team monitored the queue throughput and determined that the safest and fastest path to recovery was allowing the worker to process the existing messages sequentially rather than risking further disruption by attempting to manually manipulate the production queue. ‌ **Corrective Actions**: To ensure this type of bottleneck does not occur again, we have committed to the following: * Improving pre-production testing to better simulate the scale and conditions that can occur in production queue processing * Reviewing other areas of the platform where similar patterns could produce unexpected request spikes * Enhancing monitoring and alerting thresholds to enable faster detection and response when queue backlogs begin to form * Strengthening our deployment validation process to more thoroughly account for background data migrations before releasing dependent code changes

Read the full incident report →

Major March 12, 2026

Increased error rates with JumpCloud Agent backend.

Detected by Pingoru
Mar 12, 2026, 04:54 PM UTC
Resolved
Mar 12, 2026, 09:48 PM UTC
Duration
4h 54m
Affected: Agent
Timeline · 7 updates
  1. investigating Mar 12, 2026, 04:54 PM UTC

    We are currently investigating an issue with delays in syncing user, commands, and policy information with the JumpCloud Devices Agent backend service. We are investigating the cause of the issues currently, and will provide an update within one hour. Agent / Device logins are operating as expected.

  2. investigating Mar 12, 2026, 06:08 PM UTC

    We are continuing to investigate an issue with delays in syncing user, commands, and policy information with the JumpCloud Devices Agent backend service. Local Agent / Device logins without MFA or leveraging TOTP are operating as expected.

  3. investigating Mar 12, 2026, 07:24 PM UTC

    We are continuing to investigate an issue with delays in syncing user, commands, and policy information with the JumpCloud Devices Agent backend service. Local Agent / Device logins without MFA or leveraging TOTP are operating as expected.

  4. identified Mar 12, 2026, 08:26 PM UTC

    The issue has been identified and we have implemented a fix. We are starting to see some recovery with many agents checking in. We will update as our agent traffic normalizes.

  5. monitoring Mar 12, 2026, 09:08 PM UTC

    Agent traffic is reaching normal levels, and the majority of customers are seeing full service restoration. Our engineering teams continue to actively monitor backend stability and traffic patterns. We expect to move to 'Resolved' status shortly as final systems stabilize.

  6. resolved Mar 12, 2026, 09:48 PM UTC

    This incident has been resolved.

  7. postmortem Mar 17, 2026, 09:25 PM UTC

    ![](https://ci4.googleusercontent.com/proxy/wrEqLwb1L_6efYqXI_IUKSGu6I7iONw0A23aNkA8-MP-fgyC-bFgCvmwNBjljArHsghjl36aDmPxAJUDc64fr-ftgAOANowwh1Bd-jH4hZ4SGJongw=s0-d-e1-ft#http://static.jumpcloud.com/email/jc-email-footer-logo--new_v2.png) ‌ **Date**: Mar 17, 2026 **Date of Incident:** Mar 12, 2026 **Description**: RCA for Agent Backend \(HAProxy\) System Degradation ‌ **Summary:** On March 12, 2026, from 10:05 AM to 2:45 PM MDT, JumpCloud experienced a significant service degradation affecting Agent-related activities. During this window, agent updates, including syncing users, passwords, policies and other agent data, as well as new agent installations were unavailable. This was caused by a "thundering herd" event triggered by a backend traffic-shaping change. We have since identified the root causes and implemented infrastructure changes to prevent a recurrence. ‌ **What Happened?** At 10:00 AM MDT, our engineering team enabled a feature flag \(a "circuit breaker"\) designed to protect our System Insights API from high load by returning `503 Service Unavailable` responses for certain non-critical requests. While the flag performed its intended function, it had an unforeseen secondary effect on the JumpCloud Agent’s connection logic. Because the agents could not reuse existing connections for these specific failed requests, hundreds of thousands of agents in our main production environment attempted to establish new mTLS \(mutual TLS\) connections simultaneously. This created a "Thundering Herd" event that saturated our HAProxy ingress layer, exhausting CPU resources and causing a cascade of connection failures. ‌ **Root Cause:** The prolonged nature of this incident was the result of three distinct, overlapping bottlenecks that our team had to isolate and resolve one by one: 1. **CPU-Intensive SSL Handshaking:** Establishing an mTLS connection is a CPU-intensive process. The sheer volume of simultaneous connection attempts pushed our HAProxy pods to their resource limits. This caused the pods to become unresponsive, leading to "Out of Memory" \(OOM\) kills and failed health probes. 2. **Health Check Death Spiral:** Our internal health checks initially relied on a Layer 7 SSL validation. Because the CPU was 100% occupied with agent reconnections, the pods couldn't respond to their own health checks in time. This caused the system to erroneously mark healthy pods as "down”, removing them from the rotation and further overwhelming the remaining pods. 3. **Load Balancer Handshake Saturation:** As we attempted to scale our infrastructure, the Application Load Balancer \(ALB\) encountered a throughput bottleneck specifically related to the rate of new connection establishments. The surge of agents attempting to negotiate new SSL handshakes at the same time exceeded the ALB's burst capacity, temporarily preventing even healthy backend pods from receiving and processing traffic. ‌ **Why It Took Time to Resolve:** While reverting the flag was the correct first step, the agents were already in an aggressive retry loop that continued even after the 503 errors stopped. We had to experiment with several configurations \(adjusting health check intervals and timeout windows\) to find a balance that allowed pods to stay "alive" long enough to process the backlog. Stability was achieved only once we implemented Concurrency Control. By lowering the maximum allowed concurrent connections per pod, we stopped the CPU from over-committing to handshakes, allowing the system to reliably process a controlled flow of traffic until the global queue cleared. ‌ **Corrective Actions / Risk Mitigation:** **1.\) Edge Infrastructure Hardening** We are standardized on a new high-availability configuration for our HAProxy ingress layer. * **Concurrency Governance**: We have implemented a strict maxconn limit per pod. This acts as a "pressure valve," ensuring that the CPU remains available to process existing requests rather than becoming saturated by new connection attempts. * **Dynamic Capacity Management via Autoscaling**: We are implementing Horizontal Pod Autoscaling \(HPA\) for our HAProxy ingress layer, calibrated to trigger based on both CPU utilization and active connection counts. This ensures we can absorb sudden traffic fluctuations and also maintain a controlled flow of requests to our backend services. **2.\) Agent Connectivity Optimization** We are updating the JumpCloud Agent’s communication layer to be more "network-aware" during degraded states: * **Enhanced Connection Pooling**: We are reconfiguring the agent's HTTP transport logic to maximize the reuse of existing idle connections. This significantly reduces the "Connection Tax" on our backend during high-traffic events. * **Streamlined Resource Handling**: We are implementing stricter protocols for draining and closing HTTP response bodies, ensuring that pooled connections are returned to the rotation immediately and reliably. **3.\) Adaptive Retry Logic \(Jitter\)** To further break up "synchronized" traffic spikes: * **Introduction of Jitte**r: While our agents currently use exponential backoff for poll requests, we are adding randomized "jitter" to our retry intervals. This spreads reconnection attempts across a wider window, preventing large blocks of agents from hitting the service at the exact same millisecond. * **Standardizing Resilient Retry Logic:** We are transitioning the Agent’s default HTTP client to a unified **exponential backoff** model for all request types. * **Controlled Rollou**t: This update will be managed via a staged rollout to monitor for any unforeseen side effects on fleet-wide connectivity patterns.

Read the full incident report →

Looking to track JumpCloud downtime and outages?

Pingoru polls JumpCloud's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when JumpCloud reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track JumpCloud alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring JumpCloud for free

5 free monitors · No credit card required