UiPath incident

US - Automation Ops - Elevated Error Rate

UiPath experienced a critical incident on April 2, 2026 affecting Automation Ops, lasting 1h 21m. The incident has been resolved; the full update timeline is below.

Started: Apr 02, 2026, 07:11 PM UTC
Resolved: Apr 02, 2026, 08:33 PM UTC
Duration: 1h 21m
Detected by Pingoru: Apr 02, 2026, 07:11 PM UTC

Affected components

Automation Ops

Update timeline

investigating Apr 02, 2026, 07:11 PM UTC

AO-Governance in EUS region is experiencing elevated error rates. Our team is currently investigating
monitoring Apr 02, 2026, 07:53 PM UTC

From approximately 11:00 AM to 12:30 PM PST, the governance service's East US experienced resource exhaustion, UiPath generative AI services may have encountered errors during this time. Our team has scaled resources as a mitigation and investigation is ongoing.
resolved Apr 02, 2026, 08:33 PM UTC

This incident is mitigated.
postmortem Apr 09, 2026, 06:20 PM UTC

## Customer Impact Between April 2, 2026, 18:01 UTC and 19:19 UTC \(approximately 78 minutes\), customers in the US region may have experienced intermittent errors and increased latency when using AI and automation services dependent on governance policy evaluations. ## Root Cause At 18:00 UTC, a scheduled batch job initiated policy evaluations across multiple organizations, generating three times the normal request volume. This sudden surge placed heavy pressure on the governance database. Consequently, services waiting for database responses timed out and automatically retried their requests. Because the original queries were still consuming database resources \(CPU and I/O\) in the background, these immediate retries created an amplification loop that ultimately led to database resource exhaustion. ## Detection Automated monitoring detected the increased error rates within the governance service and immediately alerted our engineering team, who began investigating right away. ## Response Telemetry analysis confirmed database resource exhaustion as the root cause. To mitigate the issue and break the retry amplification loop, the engineering team immediately doubled the database's processing capacity. Following this action, error rates quickly subsided, and the governance service resumed normal processing. ## Follow-Up To prevent this issue from recurring, our engineering teams are prioritizing the following actions: * **Implement Caching:** Introduce caching mechanisms for repeated governance policy lookups to significantly reduce database load during high-volume evaluations. * **Upgrade Infrastructure:** Migrate the database to an elastically scalable model to dynamically handle sudden spikes in traffic, replacing fixed capacity ceilings. * **Enhance System Resilience:** Implement intelligent retry backoff intervals and "circuit breakers" on calling services to prevent future retry amplification loops.