UiPath incident

US - Cloud Portal - Partial disruptions

UiPath experienced a minor incident on April 21, 2026, lasting —. The incident has been resolved; the full update timeline is below.

Started: Apr 21, 2026, 05:52 PM UTC
Resolved: Apr 21, 2026, 05:52 PM UTC
Duration: —
Detected by Pingoru: Apr 21, 2026, 05:52 PM UTC

Update timeline

resolved Apr 21, 2026, 05:52 PM UTC

From approximately 14:20 UTC to 17:10 UTC, customers in the United States region may have encountered errors when signing in or navigating within the Cloud Portal. This was due to a temporary resource exhaustion on our traffic routing backend, which has been mitigated. We are in the process of increasing the resilience of the system to such situations, to prevent them from affecting customers. The underlying issue also affected some requests to the following services: Computer Vision Notification Service
postmortem May 26, 2026, 09:05 PM UTC

## Customer Impact On April 21, 2026, between 14:22 and 17:02 UTC, some customers using UiPath services in the U.S. East region experienced intermittent errors when accessing the platform. During this period, a small percentage of requests for approximately 5 minutes returned error responses. Services were restored to full availability by 17:02 UTC with no data loss or data integrity issues. The following services were impacted as a result: Computer Vision, Notification Service \(a core component responsible for routing platform requests\), and Identity \(used for user auth\). ## Root Cause The incident was triggered by a routine platform maintenance job that performs cleanup work after an infrastructure upgrade. As part of this cleanup, every service running in the affected compute cluster was restarted at nearly the same time. The simultaneous restart caused the compute cluster's combined compute and memory demand to briefly exceed what was available, leaving new service instances unable to start. The Location Service was particularly sensitive to this condition at startup and during this phase, its health checks were very strict. Under temporary resource pressure and incoming service traffic, new instances were unable to complete startup sequence quickly enough and were repeatedly marked as unhealthy. A single healthy instance carried most of the full traffic load for approximately 2.5 hours. When that instance was eventually replaced, there was a ~5 minute window during which no instance was fully ready to serve traffic, after which the service recovered automatically. ## Detection and Response The issue was initially identified by our engineering team through internal monitoring. Upon detection, our on-call team was engaged immediately and began an investigation. The service self-recovered by 17:02 UTC, and a formal incident was declared shortly after. A status update was posted at 17:52 UTC. ## Follow-up We have begun implementing improvements to prevent a recurrence. Key actions include: * Adjusting the service startup conditions to make sure new service instances are available before removing the old ones. * Strengthening safeguards to maintain minimum service availability * Improving alerting to detect and notify teams earlier of service instance replica availability issues.