UiPath incident
Multi-Region Failures on start jobs relying on Serverless runtimes
Update timeline
- resolved Apr 14, 2026, 11:22 AM UTC
We observed failures in starting jobs that rely on serverless runtimes across multiple regions. As a result, job executions on serverless runtimes were impacted. The fix has been implemented and is stable.
- postmortem Apr 16, 2026, 12:52 PM UTC
## **Customer Impact** Between April 14, 2026 at 10:02 am UTC and April 14, 2026 at 11:05 am UTC, a significant number of customers were unable to start jobs that relied on Serverless runtimes. During this period, attempts to initiate Serverless jobs—including scheduled executions, debug sessions, and automation app executions—failed with errors, resulting in automation workflows not running as expected. Customers may have seen error messages \(such as conflict errors on job start requests\) or found debug functionality unavailable within their automation environments. This incident affected customers across regions globally using Serverless automation features in our cloud platform. The primary disruption lasted approximately one hour and three minutes. Some customers may have experienced brief residual effects on debug functionality in our web-based design tools shortly after the main issue was resolved. ## **Root Cause** The incident was triggered by an erroneous update to a configuration setting that controls the availability of Serverless runtimes. The intent behind the change was to disable Serverless and related compute functionality in a specific environment where those services are not deployed. However, an incorrect configuration that contained a negation clause was used. Because this negation clause was not removed, the resulting configuration inverted the intended logic—disabling Serverless runtimes across nearly all regions instead of only in the targeted environment. As a result, all requests to start jobs that depended on Serverless runtimes were rejected by the platform, returning error responses. This affected serverless workloads, debug jobs, standard automation jobs, and app executions. The misconfiguration propagated rapidly across all affected regions, as our platform services automatically refresh their configuration settings at frequent intervals. Once the error was identified through a review of the recent configuration change, the negation clause was removed and the corrected configuration was redeployed. Our platform services picked up the fix within their regular refresh cycle, restoring Serverless job execution capability across all regions. Full recovery was confirmed shortly after the corrected configuration was deployed. ## **Detection** Initial automated alerts were generated at 10:06 UTC. However, the correlation of these alerts to a customer-impacting issue was delayed due to a high volume of concurrent alert activity. The engineering team confirmed the impact and began an active investigation at 10:50 UTC. ## **Response** Upon engagement at 10:50 am UTC, engineers began investigating the root cause. By 10:54 am UTC, the team had identified a recent configuration change as a likely cause. The configuration was reviewed on an incident response call, and the erroneous negation clause was confirmed as the source of the problem—it had inverted the intended logic, disabling Serverless runtimes globally rather than in a single targeted environment. At 10:56 am UTC, a corrective update to the configuration was prepared. By approximately 10:59 am UTC, the fix was deployed. Our platform services, which automatically refresh configuration settings every ten seconds, began picking up the corrected setting shortly thereafter. By approximately 11:03 am UTC, the team observed successful job start responses from the platform, confirming that mitigation was taking effect. By 11:05 am UTC, monitoring confirmed that Serverless job execution had been fully restored across all regions, and customers were once again able to start jobs as expected. Throughout the response, the team verified recovery through both service metrics and direct testing of job execution. A status page update and impact summary were also prepared for affected customers. ## **Follow-up** To prevent similar incidents in the future, we are implementing several targeted improvements: 1. **Remove capability to disable service via Feature Flag**: We are removing the possibility to disable a service by a simple feature flag and will rely on a static service configuration that follows our Secure Deployment Principles through ringed rollouts. 2. **Stronger review and governance processes**: We are improving documentation, peer review requirements, and governance controls for configuration changes that affect service availability, ensuring that changes with broad impact receive appropriate scrutiny before deployment. 3. **Faster detection through improved monitoring**: We are enhancing monitoring and alerting to detect abnormal drops in job execution success rates within minutes, reducing detection time for service-impacting issues. This includes reviewing how existing health checks interact with traffic-based alerting to eliminate blind spots that delayed detection in this incident. We understand how disruptive this incident was to your automation workflows, and we sincerely apologize for the impact. These improvements are already underway, and we are committed to learning from this event. We will continue to invest in the reliability and resilience of our platform to better support your business.
Looking to track UiPath downtime and outages?
Pingoru polls UiPath's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.
- Real-time alerts when UiPath reports an incident
- Email, Slack, Discord, Microsoft Teams, and webhook notifications
- Track UiPath alongside 5,000+ providers in one dashboard
- Component-level filtering
- Notification groups + maintenance calendar
5 free monitors · No credit card required