UiPath incident

Multiple services degraded due to auth failures in Europe region

Critical Resolved View vendor source →

UiPath experienced a critical incident on April 15, 2026 affecting Automation Cloud and Orchestrator and 1 more component, lasting 23m. The incident has been resolved; the full update timeline is below.

Started
Apr 15, 2026, 09:24 AM UTC
Resolved
Apr 15, 2026, 09:48 AM UTC
Duration
23m
Detected by Pingoru
Apr 15, 2026, 09:24 AM UTC

Affected components

Automation CloudOrchestratorAutomation HubAI CenterAction CenterAppsAutomation OpsComputer VisionCustomer PortalData Service

Update timeline

  1. investigating Apr 15, 2026, 09:24 AM UTC

    We are currently investigating auth issues in the Europe region causing failures across multiple services .Our engineering team has identified the root cause and are in process of restoring full functionality.

  2. investigating Apr 15, 2026, 09:30 AM UTC

    We are currently investigating auth issues in the Europe region causing failures across multiple services .Our engineering team has identified the root cause and are in process of restoring full functionality.

  3. monitoring Apr 15, 2026, 09:39 AM UTC

    Our engineering team has implemented a fix and services are currently stabilising . We are monitoring the system to ensure stability and full recovery. Further updates will be shared soon

  4. resolved Apr 15, 2026, 09:48 AM UTC

    The issue has been resolved. The system has remained stable during the monitoring period.

  5. postmortem May 28, 2026, 06:23 PM UTC

    ### Customer Impact On April 15, 2026, between 8:39 am UTC and 9:07 am UTC, a subset of customers in the Europe region experienced intermittent failures when accessing UiPath services. During this period, customers may have encountered "503 Service Unavailable" and "500 Internal Server Error" messages when attempting to sign in, authenticate, or perform actions requiring identity verification. The disruptions occurred across two distinct intervals: 8:39–8:47 am UTC \(approximately 8 minutes\) and 9:00–9:07 am UTC \(approximately 7 minutes\), totaling roughly 15 minutes of active impact. Affected workflows included authentication, authorization, session management, and any automation or orchestration processes that depend on identity verification. All other regions were unaffected. No data loss occurred. ### Root Cause The incident was triggered by a sudden, large surge in authentication traffic in the Europe region. At 8:39 am UTC, request volume increased approximately five-fold within seconds, driven by legitimate customer traffic. This traffic surge caused elevated memory pressure on identity service instances, which disrupted the service's connection to its distributed caching layer. When the cache layer became unavailable due to the service's protective circuit breaker activating, all requests fell through to the underlying database. The resulting surge in database queries overwhelmed the database's concurrent request capacity within seconds, causing the identity service to return 500 and 503 errors across all API endpoints. ### Detection Our automated monitoring systems detected elevated error rates within 5 minutes of the first customer-facing errors at 8:39 am UTC. The incident was formally declared at 9:05 am UTC during the second impact window, at which point our engineering and site reliability teams were already investigating. Our public status page was updated at 9:27 am UTC to reflect the ongoing investigation. ### Response Upon formal incident declaration at 9:05 am UTC, our engineering and site reliability teams launched a coordinated investigation. By 9:08 am UTC—three minutes after engagement—the team identified the overloaded database and confirmed that its concurrent request capacity had been reached through error log analysis. The second impact window had already self-resolved by 9:07 am UTC. As an additional preventive measure, the team increased resources for the affected database at 9:16 am UTC, ensuring that even under cache-fallback conditions, the database could absorb elevated query volume without reaching its capacity limit. The team continued monitoring and confirmed sustained recovery at 9:39 am UTC. The public status page was updated to "Monitoring" at 9:41 am UTC and "Resolved" at 9:49 am UTC. No further recurrence was observed. ### Follow-Up To prevent similar incidents in the future, we are implementing several targeted improvements: 1. **Cache resilience improvements.** We are improving caching policies to be more resilient under load. 2. **Enhanced rate-limiting.** We are improving rate-limiting controls to prevent such traffic spikes from causing service unavailability. 3. **Improved scaling policies.** We are revising the auto-scaling configuration to ensure that scaling decisions account for downstream resource capacity, preventing new instances from compounding database saturation during a fallback scenario. These actions reflect our commitment to continuous improvement and reliable service delivery. We take incidents like this seriously and will continue to invest in the resilience of our platform.