UiPath incident

Europe - Automation Cloud - Tenant is not showing up in the unit region

UiPath experienced a major incident on April 28, 2026 affecting Automation Cloud, lasting 10h 28m. The incident has been resolved; the full update timeline is below.

Started: Apr 28, 2026, 09:30 PM UTC
Resolved: Apr 29, 2026, 07:59 AM UTC
Duration: 10h 28m
Detected by Pingoru: Apr 28, 2026, 09:30 PM UTC

Affected components

Automation Cloud

Update timeline

investigating Apr 28, 2026, 09:30 PM UTC

Some features may be temporarily unavailable during tenant updates. Our team is actively investigating
identified Apr 28, 2026, 09:59 PM UTC

The issue has been identified, and our team is actively working to resolve it
identified Apr 28, 2026, 11:03 PM UTC

The mitigation efforts are taking longer than initially anticipated, but the team is actively working to resolve the issue as quickly as possible.
identified Apr 29, 2026, 12:06 AM UTC

Our mitigation efforts are continuing as we work toward a full resolution
identified Apr 29, 2026, 12:59 AM UTC

A fix has been implemented, and we are actively monitoring the situation to ensure there are no further issues
identified Apr 29, 2026, 02:00 AM UTC

The deployed fix is still in progress, and we are closely monitoring the environment to observe its behavior and ensure it continues to remain stable.
identified Apr 29, 2026, 03:44 AM UTC

Deployment of the fix is progressing well. We are continuing to monitor the environment closely to ensure stable system behavior.
identified Apr 29, 2026, 05:39 AM UTC

Deployment of the fix is continuing to progress as expected. System behavior remains stable, and we are closely monitoring the environment. We expect this process to take a few more hours and will provide further updates as progress continues.
resolved Apr 29, 2026, 07:59 AM UTC

The issue has been resolved. The fix has been fully deployed and the environment is stable.
postmortem May 11, 2026, 03:19 PM UTC

## Customer impact Between April 28, 2026 at 8:30 pm UTC and April 29, 2026 at 8:04 am UTC, a subset of customers in the US and EU regions experienced disruptions when adding or editing services within their tenants. Affected customers saw a persistent warning message—"Updating tenant. Some features might be unavailable during the process"—and were unable to add, remove, or access services during this time. In some cases, newly enabled features such as AI-powered services did not appear after being added. Tenants remained stuck in an "updating" state for up to approximately twelve hours during the formal incident window, blocking normal use of the platform. At least one customer experienced this issue for approximately a day before our team identified the broader pattern and began remediation. We understand the significant disruption this caused and sincerely apologize for the impact on your operations. ## Root cause A surge of lower-priority background maintenance tasks—related to deferred data cleanup—overwhelmed the processing queue responsible for managing tenant setup and configuration workflows. These tasks consumed the majority of available processing capacity, creating a growing backlog. As a result, new tenant operations entered a "pending" state but were never picked up for execution, leaving tenants stuck in the "updating" state indefinitely. This issue was compounded by database resource contention: the underlying database infrastructure reached approximately 90% of its processing capacity during the incident, further degrading throughput and slowing recovery. The combination of insufficient queue throughput, a lack of prioritization between critical and routine tasks, and constrained database resources created conditions where essential operations—such as enabling or disabling services—were blocked for an extended period. This pattern is consistent with similar past events where asynchronous state management led to tenants becoming stuck due to unprocessed operations. Recovery required two interventions: manually removing non-essential maintenance tasks from the processing queue to free up capacity, and temporarily increasing database resources to improve throughput. Once these steps were completed, tenant operations resumed and affected customers regained access to their services. A full resolution was confirmed on a per-customer basis before the incident was closed. ## Detection Our monitoring systems raised an alert at 8:29 pm UTC on April 28, 2026, and our engineering team immediately began investigating. By 8:52 pm UTC, the team identified a specific error pattern in service logs indicating that concurrent operations were blocking tenant updates. By 9:13 pm UTC, the issue was formally declared a customer-impacting incident after correlating reports from multiple affected customers exhibiting the same symptoms. While the formal incident window began at approximately 8:30 pm UTC, at least one customer had been experiencing this issue for approximately a day before the broader pattern was identified. This gap highlights an area where we are strengthening our proactive monitoring, as described in our follow-up actions below. Throughout the incident, our team maintained continuous monitoring of the issue's scope and severity, with regular status page updates to keep customers informed. ## Response Our engineering team assembled immediately after detection and began reviewing service logs to identify the cause. * **8:52 pm UTC, April 28:** The team identified the blocking error pattern—concurrent operations preventing tenant updates—and focused investigation on the processing queue. * **9:00 pm UTC, April 28:** The team confirmed that the processing queue was overloaded with lower-priority background maintenance tasks, preventing critical tenant operations from being processed. * **12:13 am UTC, April 29:** The team initiated a cleanup of lower-priority maintenance tasks from the processing queue to free up capacity for critical operations. * **2:28 am UTC, April 29:** The first affected customers were confirmed resolved, with their tenant updates and service access fully restored. * **2:40 am UTC, April 29:** The team confirmed that a significant number of tenants remained in a pending state and continued working to expedite resolution. * **2:56 am UTC, April 29:** Additional database capacity was allocated to improve processing throughput, as the underlying database infrastructure had reached approximately 90% of its processing capacity. * **7:19 am UTC, April 29:** The last tracked affected customer was confirmed fully provisioned. * **8:04 am UTC, April 29:** The incident was fully resolved and normal service was restored. Throughout the response, our team posted regular status page updates and performed targeted verification for each affected customer to confirm resolution before closing the incident. ## Follow-up To prevent recurrence and strengthen our platform's resilience, we are implementing the following improvements: 1. **Automated prioritization of critical operations:** We are updating our processing queue to automatically prioritize essential tenant operations—such as enabling or disabling services—ahead of lower-priority background maintenance tasks. 2. **Investigation into maintenance task surge:** We are conducting a dedicated investigation into the root cause of the unexpected surge in background maintenance tasks that triggered this incident, to prevent similar surges in the future. 3. **Enhanced queue and throughput monitoring:** We are adding automated monitoring and alerting for processing queue backlogs and processing throughput, enabling faster detection before customers are affected. 4. **Database resource saturation alerting:** We are implementing automated alerts for database capacity thresholds to detect resource contention earlier and trigger proactive scaling. 5. **Automated tenant recovery:** We are building periodic reconciliation processes to automatically detect and restore tenants and services to a stable state when operations are delayed or stuck, building on proven remediation strategies from similar past events and reducing reliance on manual intervention. 6. **Increased capacity planning:** We are expanding capacity planning for processing resources during peak operation periods to reduce the risk of bottlenecks. 7. **Improved status communication cadence:** We are refining our status page update processes to ensure more consistent and timely communication during prolonged incidents. 8. **Clearer customer-facing messaging:** We are updating the tenant status messages customers see during updates to provide clearer information about progress and expected resolution times. 9. **Pre-established resolution verification:** We are developing standardized verification procedures to confirm resolution for affected customers more efficiently during future incidents. We sincerely apologize for the disruption this incident caused. We are committed to learning from this event and similar past incidents, and will continue to invest in the reliability and resilience of our platform to minimize the likelihood and impact of similar issues in the future.