UiPath Outage History

UiPath is up right now

There were 65 UiPath outages since February 23, 2026 totaling 58h 10m of downtime. Each is summarised below — incident details, duration, and resolution information.

Source: https://status.uipath.com

Major May 1, 2026

Automation Hub Service Degradation

Detected by Pingoru
May 01, 2026, 11:44 PM UTC
Resolved
May 02, 2026, 12:24 AM UTC
Duration
40m
Affected: Automation Hub
Timeline · 5 updates
  1. investigating May 01, 2026, 11:44 PM UTC

    We’re currently experiencing issues with Automation Hub, and some users may be unable to access the service. Our team is actively investigating and implementing mitigation steps. We’ll provide updates as soon as we have more information.

  2. investigating May 01, 2026, 11:45 PM UTC

    We’re currently experiencing issues with Automation Hub, and some users may be unable to access the service. Our team is actively investigating and implementing mitigation steps. We’ll provide updates as soon as we have more information.

  3. identified May 01, 2026, 11:48 PM UTC

    The issue has been identified, and our team is actively implementing a fix. We’re working to restore full service as quickly as possible.

  4. monitoring May 01, 2026, 11:51 PM UTC

    he fix has been deployed, and our team is actively monitoring the system to ensure stability.

  5. resolved May 02, 2026, 12:24 AM UTC

    he issue has been fully resolved, and service has been restored. All systems are now operational.

Read the full incident report →

Major May 1, 2026

Multiple Regions - Signing With Microsoft is momentarily impacted

Detected by Pingoru
May 01, 2026, 05:59 PM UTC
Resolved
May 01, 2026, 06:37 PM UTC
Duration
37m
Affected: Automation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudAutomation CloudCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer Portal
Timeline · 4 updates
  1. investigating May 01, 2026, 05:59 PM UTC

    Signing With Microsoft is momentarily impacted

  2. identified May 01, 2026, 06:10 PM UTC

    The issue has been identified, and the team is actively working on mitigation. Further updates will be shared as progress continues.

  3. monitoring May 01, 2026, 06:35 PM UTC

    The issue has been mitigated, and services are now operating normally. We will continue to monitor the situation.”

  4. resolved May 01, 2026, 06:37 PM UTC

    The issue has been mitigated, and services are now operating normally. We will continue to monitor the situation.”

Read the full incident report →

Critical April 30, 2026

Australia - Studio web & Orchestrator - Unable to access

Detected by Pingoru
Apr 30, 2026, 07:43 AM UTC
Resolved
Apr 30, 2026, 08:24 AM UTC
Duration
40m
Affected: OrchestratorStudio Web
Timeline · 4 updates
  1. investigating Apr 30, 2026, 07:43 AM UTC

    We are currently investigating an issue where users are unable to access Studio Web in the Australia region. Our team is actively working to identify the root cause and resolve it as quickly as possible.

  2. investigating Apr 30, 2026, 08:01 AM UTC

    We are continuing to investigate this issue.

  3. monitoring Apr 30, 2026, 08:12 AM UTC

    A fix has been implemented for this issue. We are currently monitoring the service to ensure stability.

  4. resolved Apr 30, 2026, 08:24 AM UTC

    The issue has been resolved. Users should now be able to access the service normally.

Read the full incident report →

Major April 28, 2026

US - IXP - Delays with IXP extraction in US region

Detected by Pingoru
Apr 28, 2026, 10:19 PM UTC
Resolved
Apr 28, 2026, 11:36 PM UTC
Duration
1h 16m
Affected: IXP
Timeline · 3 updates
  1. investigating Apr 28, 2026, 10:19 PM UTC

    IXP data extractions are currently experiencing delays and intermittent failures in our US region. Our team is actively investigating the root cause.

  2. monitoring Apr 28, 2026, 11:07 PM UTC

    A mitigation has been implemented for the IXP data extraction issue, and processing speeds are returning to normal. We are continuing to monitor the system to ensure full stability.

  3. resolved Apr 28, 2026, 11:36 PM UTC

    This incident has been resolved. IXP data extractions are processing successfully at normal speeds.

Read the full incident report →

Major April 28, 2026

Europe - Automation Cloud - Tenant is not showing up in the unit region

Detected by Pingoru
Apr 28, 2026, 09:30 PM UTC
Resolved
Apr 29, 2026, 07:59 AM UTC
Duration
10h 28m
Affected: Automation Cloud
Timeline · 9 updates
  1. investigating Apr 28, 2026, 09:30 PM UTC

    Some features may be temporarily unavailable during tenant updates. Our team is actively investigating

  2. identified Apr 28, 2026, 09:59 PM UTC

    The issue has been identified, and our team is actively working to resolve it

  3. identified Apr 28, 2026, 11:03 PM UTC

    The mitigation efforts are taking longer than initially anticipated, but the team is actively working to resolve the issue as quickly as possible.

  4. identified Apr 29, 2026, 12:06 AM UTC

    Our mitigation efforts are continuing as we work toward a full resolution

  5. identified Apr 29, 2026, 12:59 AM UTC

    A fix has been implemented, and we are actively monitoring the situation to ensure there are no further issues

  6. identified Apr 29, 2026, 02:00 AM UTC

    The deployed fix is still in progress, and we are closely monitoring the environment to observe its behavior and ensure it continues to remain stable.

  7. identified Apr 29, 2026, 03:44 AM UTC

    Deployment of the fix is progressing well. We are continuing to monitor the environment closely to ensure stable system behavior.

  8. identified Apr 29, 2026, 05:39 AM UTC

    Deployment of the fix is continuing to progress as expected. System behavior remains stable, and we are closely monitoring the environment. We expect this process to take a few more hours and will provide further updates as progress continues.

  9. resolved Apr 29, 2026, 07:59 AM UTC

    The issue has been resolved. The fix has been fully deployed and the environment is stable.

Read the full incident report →

Major April 27, 2026

Multiple Regions - Autopilot for Everyone

Detected by Pingoru
Apr 27, 2026, 10:11 PM UTC
Resolved
Apr 27, 2026, 11:21 PM UTC
Duration
1h 9m
Affected: Autopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for EveryoneAutopilot for Everyone
Timeline · 4 updates
  1. investigating Apr 27, 2026, 10:11 PM UTC

    GPT 4.o mini model is not available and is currently being investigated.

  2. investigating Apr 27, 2026, 10:19 PM UTC

    GPT 4.o mini model is not available and is currently being investigated.

  3. monitoring Apr 27, 2026, 10:53 PM UTC

    The service is now showing signs of recovery and appears healthy. We are continuing to monitor the situation closely

  4. resolved Apr 27, 2026, 11:21 PM UTC

    Service has been fully restored to a healthy state, and the issue has been mitigated.

Read the full incident report →

Major April 24, 2026

US - Cloud Robots VM - 3rd party service provider outage

Detected by Pingoru
Apr 24, 2026, 04:14 PM UTC
Resolved
Apr 25, 2026, 12:02 AM UTC
Duration
7h 48m
Affected: Cloud Robots - VMCloud Robots - VM
Timeline · 12 updates
  1. identified Apr 24, 2026, 04:14 PM UTC

    The upstream Cloud provider has confirmed an outage impacting VMs for Cloud Robots VM in the US and Delayed US regions Impact: Users may be unable to start robots Next update: We are working with the provider to understand mitigation timelines.

  2. identified Apr 24, 2026, 05:05 PM UTC

    We are still awaiting further details from the cloud service provider. We are exploring failover options as well

  3. identified Apr 24, 2026, 05:58 PM UTC

    The cloud service provider has identified the issue and started applying a mitigation, we are continuing to follow up with them for more updates. We do not have an ETA for when the mitigation will be completed yet.

  4. monitoring Apr 24, 2026, 06:10 PM UTC

    The cloud service provider has applied mitigations and is starting to see improvements from their end. We are monitoring our services to ensure they are recovering as well.

  5. monitoring Apr 24, 2026, 06:59 PM UTC

    Some VMs have still not recovered, and the cloud service provider is still actively working on completing their mitigation efforts

  6. monitoring Apr 24, 2026, 07:47 PM UTC

    The cloud service provider is still actively working on completing their mitigation efforts

  7. monitoring Apr 24, 2026, 08:23 PM UTC

    We are seeing the remaining VMs recover, we will monitor to ensure there is no regression

  8. monitoring Apr 24, 2026, 09:17 PM UTC

    While we are not seeing any more impact on Cloud Robot VMs, we are continuing to follow the cloud provider's outage until it is fully resolved.

  9. monitoring Apr 24, 2026, 10:10 PM UTC

    We are continuing to follow the cloud provider's outage until it is fully resolved.

  10. monitoring Apr 24, 2026, 11:12 PM UTC

    We are continuing to follow the cloud provider's outage until it is fully resolved.

  11. resolved Apr 25, 2026, 12:02 AM UTC

    The issue has been resolved

  12. postmortem Apr 28, 2026, 04:12 PM UTC

    ## Customer Impact Between approximately 3:00 pm UTC on April 24, 2026, and 12:04 am UTC on April 25, 2026, a subset of customers in the US Region experienced failures when starting, restarting, or provisioning cloud robots \(virtual machines\) through Automation Cloud. Impacted customers encountered errors such as "Partially initialized” or “Failed” status on machines. Existing virtual machines that were already running generally continued to operate, but attempts to provision new machines or restart stopped ones frequently failed. Some customers also experienced delays in job scheduling. The disruption lasted approximately nine hours. We sincerely apologize for the impact this incident had on your automation workflows. We understand that reliable cloud robot availability is critical to your operations, and we take this disruption seriously. ## Root cause The incident was caused by a widespread outage affecting the virtual machine service of our underlying cloud infrastructure provider in the US Region. The provider's outage began at 11:39 am UTC on April 24, 2026—several hours before customer-facing impact became apparent—when a recent deployment to their virtual machine platform introduced a fault that disrupted the ability to start, restart, or provision new virtual machines across multiple availability zones. The outage affected multiple infrastructure services beyond virtual machines, including networking, caching, and container orchestration components. At the time of initial investigation, multiple machines were found with a requested status of "running" but an actual status of "partially initialized," all having transitioned to this failed state within a narrow window around 3:00–3:30 pm UTC. Second, the provider's outage caused connectivity failures within portions of our service infrastructure in the US Region, which led to errors surfacing to the customer while interacting with the platform. ## Detection The incident was first detected at approximately 3:00 pm UTC on April 24, 2026, when the first customer report was received indicating that cloud robots could not be started. Automated monitoring surfaced error patterns including "partially initialized" shortly thereafter. By 3:52 pm UTC, the incident was formally declared and a response team was assembled. Because the failures were mostly limited to VM lifecycle operations, and not all running workloads, existing automated health checks did not trigger for all affected scenarios. There was a gap of approximately 15–20 minutes between the first customer report and full scope determination, as the initial impact was sporadic and became clear only after correlating customer reports with infrastructure telemetry. The team identified multiple machines in a failed state across multiple affected organizations. By 4:15 pm UTC, the scope was sufficiently understood, and a status page update was posted to inform customers of the identified issue. A high-priority support case was also opened with the cloud provider at this time. Notably, the provider's outage had begun at 11:39 am UTC, over three hours before customer-facing impact was detected. The delay between the provider's outage start and observable customer impact is being examined as part of our detection improvement efforts. ## Response Upon detection, our engineering team immediately began investigating the scope and root cause of the failures. By querying internal systems, the team identified that the issue was isolated to the US Region and primarily affected VM start and provisioning operations. The team correlated affected accounts and machines to determine the breadth of impact. Simultaneously, the team investigated broader service degradation and discovered that portions of our service infrastructure in the US Region were experiencing connectivity failures caused by the provider's outage. This caused some platform calls to time out, contributing to automation job scheduling delays. The team traced these failures to specific infrastructure hosts that had been impaired by the provider's outage. The following mitigation actions were taken: * **Service component relocation:** At approximately 6:20 pm UTC, affected service components were relocated from impaired infrastructure hosts to healthy ones. This was performed carefully, one component at a time, to minimize risk. After relocation, the platform service “hypervisor” responded again to calls successfully * **Cloud provider engagement:** A high-priority support case was opened with the cloud provider, and the team monitored their public status page for updates. The provider confirmed at approximately 5:40 pm UTC that they had begun reverting their faulty deployment. The team also submitted a detailed list of affected VM identifiers to the provider's support case to assist their investigation. * **VM recovery testing:** The team conducted targeted tests on affected VMs to verify restoration. Some VMs in certain availability zones remained impacted even after initial mitigations, as the provider's recovery progressed zone by zone. By 8:12 pm UTC, previously affected VMs were confirmed operational, even before the provider had updated their own status page. However, at approximately 10:05 pm UTC, the provider reported a regression in one availability zone and initiated a second corrective action expected to take up to three hours, extending the monitoring period. By April 25, 2026 at 12:04 am UTC, VM operations were consistently succeeding across all availability zones, and the incident was marked as resolved. Throughout the event, we maintained regular status page updates and communicated directly with impacted customers, including proactive outreach to verify that affected machines had returned to normal operation. ## Follow-up To reduce the risk and impact of similar incidents in the future, we are implementing several targeted improvements: 1. **Enhanced detection and alerting:** We are expanding our monitoring to include more granular checks on VM lifecycle operations, ensuring that failures in start, restart, or provisioning actions are surfaced immediately—even when running workloads are unaffected. This includes adding VM health monitoring capabilities that were not previously in place; correcting alert configurations that referenced incorrect regions during this incident, and exploring earlier detection of upstream provider outages before they manifest as customer-facing impact. We are also investigating ways to reduce the three-hour gap between the provider's outage onset and our initial detection of customer impact. 2. **Automated impact correlation:** We are developing automated tooling to rapidly identify affected accounts and machines based on error states, enabling faster scoping and customer notification. During this incident, impact assessment required manual queries, we are automating this process to significantly reduce response time. 3. **Regional failover readiness:** We are investing in infrastructure changes to support more flexible failover and workload migration for cloud robots, including the ability to provision new VMs in alternate regions when a primary region is impaired. Currently, cloud robot VMs are region-bound and no backup provisioning path exists in a secondary region. We are addressing this gap to provide greater resilience against single-region provider outages—a recurring pattern we have observed across similar past events. 4. **Customer guidance and communication:** We are updating our customer-facing documentation and in-product messaging to provide clear guidance on steps to take when VM operations fail due to underlying infrastructure outages. We are also improving our status page update cadence and clarity to keep customers better informed during extended incidents. This incident follows a pattern seen in similar past events, where external platform outages in a single region have disrupted automation services. We are applying lessons learned from those events—including the importance of rapid detection, clear customer communication, and resilient failover strategies—to drive systematic improvements. Our commitment is to continually strengthen our platform's reliability and transparency, so customers can trust Automation Cloud for their most critical workloads.

Read the full incident report →

Notice April 24, 2026

Multiple service outage on US region

Detected by Pingoru
Apr 24, 2026, 11:15 AM UTC
Resolved
Apr 24, 2026, 11:15 AM UTC
Duration
Timeline · 1 update
  1. resolved Apr 24, 2026, 11:15 AM UTC

    Between 10:00 and 10:20 UTC on 24th of April 2026, several AI-dependent services in the US region were impacted — including Communications Mining, Computer Vision, IXP, Document Understanding, ScreenPlay, and Project Delegate. We understand that some customers were affected, and we will publish a Root Cause Analysis with action items to prevent recurrence.

Read the full incident report →

Major April 23, 2026

Canada - Orchestrator - Create Tenants are failing

Detected by Pingoru
Apr 23, 2026, 05:15 PM UTC
Resolved
Apr 23, 2026, 09:54 PM UTC
Duration
4h 38m
Affected: Orchestrator
Timeline · 5 updates
  1. investigating Apr 23, 2026, 05:15 PM UTC

    We are investigating reports of an outage impacting Tenant Creation for Orchestrator in Canada. Impact: Users may be unable to Create tenants in Canada Next update: Our teams are working to understand the cause and scope and will share updates as available.

  2. investigating Apr 23, 2026, 06:16 PM UTC

    We are continuing to investigate the cause of the outage impacting Tenant Creation for Orchestrator in Canada and are working on a fix. Impact: Users may continue to be unable to Create tenants. Next update: Our focus is on restoring service as quickly as possible.

  3. investigating Apr 23, 2026, 07:37 PM UTC

    We continue to investigate the cause of the issue impacting Tenant Creation for Orchestrator in Canada and are working on a fix. Impact: Users may continue to be unable to Create tenants. Next update: Our focus is on restoring service as quickly as possible.

  4. identified Apr 23, 2026, 09:05 PM UTC

    We have identified the cause of the outage impacting Tenant creation for Orchestrator in Canada and are working on a fix. Impact: Users may continue to be unable to Create tenants in Canada region Next update: Our focus is on restoring service as quickly as possible.

  5. resolved Apr 23, 2026, 09:54 PM UTC

    The issue is resolved and Tenant Creation in Canada is operating as expected. Impact: All user facing functionality is available.

Read the full incident report →

Major April 23, 2026

Multiple Regions - Multiple Services - Agent runtime errors

Detected by Pingoru
Apr 23, 2026, 03:24 PM UTC
Resolved
Apr 23, 2026, 11:36 PM UTC
Duration
8h 11m
Affected: AgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgents
Timeline · 12 updates
  1. investigating Apr 23, 2026, 03:24 PM UTC

    We are investigating reports of an outage impacting Agents in US and Europe Regions. Impact: Users may be unable to start Agents. Next update: Our teams are working to understand the cause and scope and will share updates as available.

  2. investigating Apr 23, 2026, 04:17 PM UTC

    We are continuing to investigate the cause of the issues, we will provide an update as soon as we begin working on a mitigation

  3. investigating Apr 23, 2026, 04:44 PM UTC

    Further investigation has revealed that this issue is affecting all regions, so customers may experience Agents failing to start across all UiPath Automation Cloud regions. We are continuing to investigate the root cause of the issue and will be working to mitigate it promptly.

  4. identified Apr 23, 2026, 05:21 PM UTC

    We have identified a defect in a recent deployment, which is causing the issue. We are preparing to roll back the change, which will return Agents to full functionality for all customers.

  5. identified Apr 23, 2026, 06:18 PM UTC

    We are validating the reversion of the improper configuration in a lower environment, and will begin deploying it to production regions shortly

  6. identified Apr 23, 2026, 07:16 PM UTC

    We are still working on a mitigation for this issue

  7. identified Apr 23, 2026, 08:14 PM UTC

    We have a mitigation and are validating it prior to deploying it to production

  8. identified Apr 23, 2026, 09:06 PM UTC

    We are continuing to develop and validate a mitigation for this issue

  9. identified Apr 23, 2026, 09:29 PM UTC

    We are still working on a mitigation. As a temporary workaround: The issue appears to be related to the IP Restriction security setting, as internal testing has shown that disabling it mitigates the issue. We understand that disabling this feature may not be possible for all customers, but we wish to provide it as an option. In order to apply this mitigation: navigate to Admin -> Security Settings -> IP Restriction

  10. identified Apr 23, 2026, 09:54 PM UTC

    We have identified a mitigation and validated it in a lower environment. We are rolling out the mitigation to all affected regions.

  11. monitoring Apr 23, 2026, 10:49 PM UTC

    The mitigation has been rolled out, and we are validating that it has resolved the issue

  12. resolved Apr 23, 2026, 11:36 PM UTC

    The issue has been resolved

Read the full incident report →

Major April 22, 2026

Delayed US - IXP - Model unvailable

Detected by Pingoru
Apr 22, 2026, 02:18 PM UTC
Resolved
Apr 22, 2026, 03:22 PM UTC
Duration
1h 4m
Affected: IXP
Timeline · 3 updates
  1. monitoring Apr 22, 2026, 02:18 PM UTC

    We identified an issue that caused some IXP models to be unavailable or incorrectly shown as “not found.” A fix has been deployed, and we are continuing to monitor the situation closely. We will share a detailed root cause analysis for this incident at a later time.

  2. resolved Apr 22, 2026, 03:22 PM UTC

    In the GXP US region, between 13:28:33 UTC and 13:52:42 UTC on April 22, 2026, customers experienced HTTP 404 errors when retrieving extractions for IXP UCD models, resulting in a full outage of runtime extractions during that period. The issue also impacted CM predictions, with additional 404 errors observed . This issue has since been resolved.

  3. postmortem Apr 30, 2026, 03:11 PM UTC

    ### Customer Impact Between April 22, 2026 1:28 pm UTC and April 22, 2026 1:52 pm UTC, some customers in the US region were unable to perform document extractions using IXP. During this window, all extraction and prediction requests against pinned model versions returned HTTP 404 Not Found errors. We sincerely apologize for the disruption this caused to your operations. ### Root Cause The incident was triggered by a deployment of a new release of IXP in the impacted region. A targeted code change that required a manual preparation step to be completed before deployment was released before the manual step was run. In environments with more frequent release schedules, this step was handled automatically by an existing part of the deployment process. However, the US and EU regions follow a less frequent, consolidated release schedule, so this automation was not present and the required manual step was missed. As a result, several components of the model management system began reading data from an incorrect location. Because the required data did not exist, requests for extractions and predictions against pinned model versions returned 404 Not Found errors. ### **Detection** Automated monitoring detected the issue after the deployment completed, at approximately 1:28 pm UTC on April 22, 2026. These monitoring checks, which continuously validate core user workflows including extraction against pinned model versions, began failing as soon as the service started returning errors. ### **Response** Upon detection, our engineering team identified the root cause and manually ran the required step against the affected environment, restoring the model management system to read from the correct location. After a sustained monitoring period to verify stable recovery—including confirmation that automated monitoring checks resumed passing and that traffic returned to normal response patterns—the incident was confirmed as fully resolved at 3:22 pm UTC. ### Follow-Up The IXP delayed release process will be reviewed, with a specific focus on why this issue was not seen in earlier regular releases.

Read the full incident report →

Notice April 22, 2026

Japan - Apps - Service Unavailability

Detected by Pingoru
Apr 22, 2026, 01:20 PM UTC
Resolved
Apr 22, 2026, 01:20 PM UTC
Duration
Timeline · 1 update
  1. resolved Apr 22, 2026, 01:20 PM UTC

    From approximately 09:52 UTC to 10:21 UTC, customers in the Japan region may have been unable to access the App Service. This was due to an issue in our deployment pipeline that temporarily affected cluster availability, which has been mitigated. A detailed Root Cause Analysis will be shared separately.

Read the full incident report →

Major April 21, 2026

UAE - Insights - Dashboard is down

Detected by Pingoru
Apr 21, 2026, 05:59 PM UTC
Resolved
Apr 21, 2026, 11:38 PM UTC
Duration
5h 39m
Affected: Insights
Timeline · 4 updates
  1. identified Apr 21, 2026, 05:59 PM UTC

    We have identified the cause of the outage impacting Dashboard for Insights in UAE and are working on a fix. Impact: Users may be unable to access the Insights dashboard Next update: Our focus is on restoring service as quickly as possible.

  2. identified Apr 21, 2026, 08:18 PM UTC

    The cause of the outage impacting Dashboard for Insights in UAE has been identified and are working on a fix. Impact: Users may be unable to access the Insights dashboard Next update: Our focus is on restoring service as quickly as possible.

  3. resolved Apr 21, 2026, 11:38 PM UTC

    The outage has been resolved and Insights dashboard in UAE is fully operational. Impact: No ongoing user impact.

  4. postmortem Apr 24, 2026, 07:42 PM UTC

    ## _Customer Impact_ On **April 17, 2026**, the UiPath **Insights portal and Provisioning service** experienced an outage affecting customers in the **UAE region** following a scheduled deployment. During this period, users were unable to access any of the data in Insights dashboards. The impact began at approximately **12:00 UTC on April 17** and was fully resolved by **23:38 UTC on April 21, 2026**, resulting in a total service disruption of approximately **4 days and 11 hours**. ## _Root Cause_ The outage was triggered by an issue in our deployment process for the UAE region. During a recent service deployment, a required step in our deployment workflow failed silently, which prevented the new version of the Insights portal from being fully activated in UAE. Over time, the older version that remained in service began referencing files that had since been removed as part of routine maintenance, causing the portal to become inaccessible to customers_._ ## _Detection_ The issue was discovered through **UiPath Insights internal alerts** and **automated availability monitoring** in the UAE region. These signals indicated portal unavailability, prompting the engineering team to investigate the deployment pipeline and identify the root cause_._ ## _Response_ Upon detecting the outage, the engineering team investigated the deployment pipeline and identified that a re-deployment was needed to restore service. However, the team encountered two additional issues that had to be resolved first: an outdated infrastructure component introduced a pipeline failure, and a misconfigured integration setting — left over from the initial UAE region setup — had been silently blocking deployments. On **April 21, 2026 at approximately 20:30 UTC**, both issues were resolved, allowing a successful re-deployment that fully restored the Insights portal in UAE by **23:38 UTC on April 21**. ## _Follow-Up_ To prevent similar incidents in the future, we are taking the following steps: * **Establish a mandatory verification checklist for new region deployments** to ensure all configuration settings are correctly applied before going live. * **Ensure any incomplete deployment steps are treated as active incidents** and fully resolved before proceeding with subsequent releases. * **Strengthen monitoring and alerting coverage for recently launched regions** to ensure issues are detected and escalated promptly, regardless of region maturity.

Read the full incident report →

Minor April 21, 2026

US - Cloud Portal - Partial disruptions

Detected by Pingoru
Apr 21, 2026, 05:52 PM UTC
Resolved
Apr 21, 2026, 05:52 PM UTC
Duration
Timeline · 1 update
  1. resolved Apr 21, 2026, 05:52 PM UTC

    From approximately 14:20 UTC to 17:10 UTC, customers in the United States region may have encountered errors when signing in or navigating within the Cloud Portal. This was due to a temporary resource exhaustion on our traffic routing backend, which has been mitigated. We are in the process of increasing the resilience of the system to such situations, to prevent them from affecting customers. The underlying issue also affected some requests to the following services: Computer Vision Notification Service

Read the full incident report →

Major April 20, 2026

Delayed US - Autopilot for Everyone - Sign-in issues

Detected by Pingoru
Apr 20, 2026, 06:55 PM UTC
Resolved
Apr 20, 2026, 08:13 PM UTC
Duration
1h 18m
Affected: Autopilot for Everyone
Timeline · 4 updates
  1. investigating Apr 20, 2026, 06:55 PM UTC

    Users may be unable to sign-in to Autopilot for Everyone, and may be unable to send messages even if already logged in.

  2. monitoring Apr 20, 2026, 07:13 PM UTC

    We have applied a mitigation and are seeing traffic return to normal. We will continue to monitor the situation as it improves

  3. resolved Apr 20, 2026, 08:13 PM UTC

    We have applied a mitigation and are seeing traffic return to normal. We will continue to monitor the situation as it improves

  4. postmortem Apr 20, 2026, 10:16 PM UTC

    ## Customer Impact ‌ Between approximately 6:00 pm UTC and 7:14 pm UTC on April 20, 2026, customers using the Autopilot for Everyone service in the United States Delayed region were unable to sign in or send chat messages. All attempts to access the service returned errors, rendering the service effectively unavailable. Customers with cached credentials were also unable to send messages, and no workaround was available during the outage. Scope: The impact was limited to customers accessing Autopilot for Everyone via Portal > AI Trust Layer and those using the Autopilot for Everyone via Assistant in the United States Delayed region. ## Root Cause ‌ A service update deployed to the Autopilot for Everyone service on April 20, 2026 triggered an unexpected modification to the traffic routing configuration in the United States Delayed infrastructure. The routing rules that direct incoming customer requests were altered to an unsupported configuration. This mismatch meant that the routing layer could not match incoming requests to any valid destination, resulting in HTTP 404 errors for all service traffic. Recovery was achieved by manually patching the routing configuration to include both the original correct address and the modified address, allowing incoming customer requests to be properly routed and restoring service availability. The exact mechanism by which the infrastructure-level process introduced the incorrect routing address has been determined and we are working on a long-term fix. ## Detection ‌ The incident was first detected at 6:02 pm UTC on April 20, 2026, when automated monitoring reported failures in the Autopilot for Everyone service. Multiple alerts were triggered simultaneously, including service availability checks and automated browser-based tests targeting the affected region, all indicating that the service was unreachable. Engineers observed HTTP 404 errors on all service endpoints, including health checks, along with routing-level "no route" errors confirming that traffic could not reach the service. ## Response The service update that triggered the issue completed at approximately 6:00 pm UTC. The engineering team attempted to revert the change, but the rollback did not resolve the issue. Approximately 43 minutes elapsed between the deployment and attempted rollback. The persistent errors after the revert prompted the team to escalate, formally declare an incident, and begin a coordinated response. Upon detection, our engineering team began investigating the routing configuration and identified that it referenced an incorrect internal address that did not match the address used by incoming customer traffic. A revert of the service update had already been attempted but did not resolve the issue. To confirm the diagnosis, the team performed targeted tests against both the incorrect and correct addresses. Requests sent to the incorrect address returned successfully, while requests using the expected address failed — confirming that the routing configuration was functional but pointing to an unsupported destination. By 7:06 pm UTC, the team had formulated a plan to manually patch the routing configuration by adding the correct address alongside the existing incorrect entry. This additive approach was chosen as a safe, non-destructive fix that would restore service without risking disruption to any processes that might depend on the existing configuration. At 7:11 pm UTC, the routing configuration was patched. Health checks immediately returned successful responses, and automated service monitoring confirmed that the service was recovering. By 7:14 pm UTC, the incident was marked as mitigated and normal functionality was restored. The team continued monitoring the service for approximately one hour, confirming a full recovery with no further failures. The incident was marked as fully resolved at 8:14 pm UTC. ## Follow-up ‌ To prevent similar incidents in the future, we are implementing the following improvements: * Infrastructure process investigation: We have identified why an infrastructure-level process modified the routing configuration. We will follow up with safeguards to ensure that modifications are backwards-compatible with deployments running older releases. We are committed to making our systems more resilient and transparent. These improvements will help us detect configuration issues earlier, reduce recovery times, and deliver a more reliable experience for all customers.

Read the full incident report →

Notice April 20, 2026

Multiple Services degraded in US Region

Detected by Pingoru
Apr 20, 2026, 12:52 PM UTC
Resolved
Apr 20, 2026, 12:52 PM UTC
Duration
Timeline · 2 updates
  1. resolved Apr 20, 2026, 12:52 PM UTC

    Between 12:02 and 12:14 UTC, an issue affecting one of our core services, Orchestrator, caused disruption across multiple services. The incident was triggered by unexpected restarts in the underlying infrastructure supporting Orchestrator. We understand the impact this has had on our customers and apologize for the disruption. We will publish a detailed RCA, including action items to prevent a recurrence.

  2. postmortem Apr 24, 2026, 11:38 AM UTC

    ## Customer Impact Between 12:02 pm UTC and 12:15 pm UTC on April 20, 2026, customers in the U.S. region experienced elevated error rates and failed requests when using the Orchestrator service. During this approximately 13-minute window, automation workflows were interrupted and a portion of requests returned errors. Error rates began declining by 12:14 pm UTC and returned to normal by 12:15 pm UTC. Impact was limited to the Orchestrator service in the U.S. region. All other services and regions remained fully operational. ## Root Cause A routine infrastructure upgrade in the U.S. region restarted all servers supporting the Orchestrator service within a four-minute window—far more rapidly than intended. A safeguard designed to limit how many service instances can be unavailable at once was in place, but was not honored during the upgrade, so nearly all Orchestrator instances went offline simultaneously. Remaining capacity was insufficient to serve normal traffic, resulting in widespread errors until instances came back online. The exact mechanism by which the upgrade bypassed the safeguard is under deeper investigation. ## Detection Automated monitoring detected a sudden increase in errors at 12:05 pm UTC—approximately three minutes after the impact began. Engineers acknowledged the alerts and began investigating immediately. Customer reports arrived within the same window and corroborated the alerts. ## Response Engineers assembled on an incident call and quickly identified that all Orchestrator service instances had restarted nearly simultaneously, correlating with an ongoing infrastructure upgrade. Underlying database and service dependencies were verified as healthy, focusing the investigation on the upgrade process itself. The service recovered on its own as instances came back online. **April 20, 2026** * 12:12 pm UTC— Incident call assembled— ongoing upgrade identified as the cause. * 12:14 pm UTC— Error rates dropped below 2% as instances recovered. * 12:15 pm UTC— Service returned to normal operation. * 12:26 pm UTC— Full-service health confirmed—Remaining Orchestrator upgrades paused. * 12:52 pm UTC— Status page updated and the incident was marked resolved shortly after. ## Follow-Up We are implementing improvements to reduce the likelihood of similar incidents. _Short-term improvements_ * Investigate the exact mechanism by which the upgrade bypassed the restart safeguard, and add automated validation to enforce it during all maintenance activities. * Add pre-upgrade verification to confirm service capacity will be maintained before any changes are applied. * Add monitoring and alerts for unexpected patterns of simultaneous service restarts. _Long-term improvements_ * Update operational procedures and team training to reinforce best practices for safe infrastructure upgrades.

Read the full incident report →

Major April 19, 2026

Canada - Serverless Robots - Job Failures

Detected by Pingoru
Apr 19, 2026, 10:01 AM UTC
Resolved
Apr 19, 2026, 02:49 PM UTC
Duration
4h 48m
Affected: Serverless Robots
Timeline · 5 updates
  1. investigating Apr 19, 2026, 10:01 AM UTC

    Jobs running in the serverless environment in the Canada region may have experienced intermittent request failures, causing the entire job to fail.

  2. identified Apr 19, 2026, 11:06 AM UTC

    We have identified the fix and it is currently being deployed.

  3. identified Apr 19, 2026, 11:27 AM UTC

    We have mitigated the issue and are now applying a fix.

  4. monitoring Apr 19, 2026, 01:26 PM UTC

    The issue has been mitigated, and services are functioning as expected. We are continuing to monitor the environment to confirm full recovery.

  5. resolved Apr 19, 2026, 02:49 PM UTC

    The issue has been mitigated and system is stable..

Read the full incident report →

Minor April 17, 2026

Multiple Regions - Cloud Robots- VM - VPN gateway

Detected by Pingoru
Apr 17, 2026, 11:10 AM UTC
Resolved
Apr 17, 2026, 02:28 PM UTC
Duration
3h 17m
Affected: Cloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VM
Timeline · 4 updates
  1. investigating Apr 17, 2026, 11:10 AM UTC

    We have identified an issue affecting customers who have a VPN gateway configured and are navigating through the UI. Other customer scenarios remain unaffected. Based on current observations, the overall impact is low.

  2. identified Apr 17, 2026, 12:14 PM UTC

    A fix has been identified and is currently being implemented.

  3. monitoring Apr 17, 2026, 01:40 PM UTC

    A fix has been deployed, and we are actively monitoring the system.

  4. resolved Apr 17, 2026, 02:28 PM UTC

    A fix has been deployed, and system is stable.

Read the full incident report →

Major April 17, 2026

Multiple Regions - Automation Cloud - Partial Service Disruption

Detected by Pingoru
Apr 17, 2026, 09:29 AM UTC
Resolved
Apr 17, 2026, 12:28 PM UTC
Duration
2h 58m
Affected: OrchestratorOrchestratorOrchestrator
Timeline · 4 updates
  1. investigating Apr 17, 2026, 09:29 AM UTC

    Subset of operations across multiple services is intermittently failing across multiple regions.

  2. identified Apr 17, 2026, 10:39 AM UTC

    A fix has been identified and is currently being implemented.

  3. monitoring Apr 17, 2026, 10:47 AM UTC

    A fix has been deployed, and we are actively monitoring the system.

  4. resolved Apr 17, 2026, 12:28 PM UTC

    A fix has been deployed, and system is stable.

Read the full incident report →

Major April 16, 2026

Multiple Regions - Document Understanding - Elevated Error Rates

Detected by Pingoru
Apr 16, 2026, 06:32 PM UTC
Resolved
Apr 16, 2026, 07:39 PM UTC
Duration
1h 6m
Affected: Document UnderstandingDocument Understanding
Timeline · 4 updates
  1. investigating Apr 16, 2026, 06:32 PM UTC

    We are currently investigating an increase in failed requests affecting Document Understanding (DU) generative extraction capabilities in both the US and EU regions. Our team has identified the issue and is actively working on mitigation. During this time, users may experience higher error rates or incomplete responses when using generative extraction features. We will provide updates as we make progress toward full resolution.

  2. monitoring Apr 16, 2026, 07:23 PM UTC

    We have observed that error rates affecting Document Understanding (DU) generative extraction in both the US and EU regions have subsided. The system is currently stable, and our team continues to monitor performance closely to ensure full reliability. We will share further updates if there are any changes.

  3. resolved Apr 16, 2026, 07:39 PM UTC

    The issue affecting Document Understanding (DU) generative extraction in both the US and EU regions has been fully resolved. All systems are operating normally. We will continue to monitor to ensure ongoing stability.

  4. postmortem Apr 23, 2026, 06:36 PM UTC

    ## Customer Impact Between April 16, 2026 at 17:31 UTC and April 16, 2026 at 18:32 UTC, a subset of customers in the US and EU regions experienced elevated error rates and incomplete responses when using the Document Understanding service's generative extraction features. During this period, affected users encountered server errors \(HTTP 500\) when submitting documents for automated data extraction, resulting in approximately 650 failed requests across both regions, with the US region accounting for the majority. The period of elevated errors lasted approximately one hour before subsiding. Some affected organizations experienced error rates exceeding 70% at the peak of the incident, while others saw lower but still disruptive failure rates. ## Root Cause The incident was caused by an unexpected, temporary change in the response format from our external AI model provider's embeddings related endpoint. This change broke the established interface contract between the provider and our Document Understanding service, preventing the service from correctly parsing embedding responses. Specifically, our service expected a structured response object containing a required attribute, but the provider returned an empty response \(HTTP 200 with an empty body\). This caused a parsing error when our service attempted to process the response body. Importantly, the provider's endpoint continued to return a successful HTTP status code, meaning the failure was not detectable at the network level and only surfaced during response processing within our service. The issue affected multiple unrelated organizations simultaneously across both regions and lasted approximately one hour, consistent with a temporary provider-side deployment or configuration change. ## Detection The incident was detected on April 16, 2026 at 17:44 UTC, when automated monitoring systems triggered alerts for increased counts of failed requests in the Document Understanding service's generative extraction features across both the US and EU regions. Analysis of service logs and error metrics revealed a sharp rise in server errors, with multiple organizations across both regions experiencing failures. The time between the onset of elevated error rates and detection was short, owing to real-time alerting and continuous monitoring of service health. ## Response Upon detection, the engineering team immediately convened via a dedicated call to investigate the root cause. A public status page was published at approximately 18:28 UTC informing customers that the team was investigating elevated error rates and incomplete responses in the generative extraction service across the US and EU regions. Initial efforts focused on isolating the affected regions and identifying patterns in failed requests. The team examined specific failed extraction requests, retrieved error artifacts for analysis, and compared them against successful reference cases to characterize the failure mode. Our engineering team ruled out all internal causes early in the investigation. No deployments or configuration changes had been made to our systems for approximately one week prior to the incident. The team confirmed through systematic log analysis that the failures were isolated to the embedding model endpoint, corroborating the external provider as the source of the disruption. A support ticket was filed with the provider the following day to obtain a detailed explanation of the change. Error rates began subsiding approximately one hour after onset. By 18:42 UTC, the team confirmed that the incident was no longer active, though investigation into the underlying cause continued. The public status page was updated at 19:19 UTC to reflect that error rates had returned to normal and the system was stable. The incident was declared fully resolved at 19:35 UTC, with all affected services operating normally. The status page was updated with a final resolution notice, and ongoing monitoring confirmed sustained stability. ## Follow-Up To reduce the risk of similar incidents and minimize customer impact from external dependency failures, we are implementing the following targeted improvements: 1. **Fallback and redundancy strategies:** We are evaluating and, where feasible, implementing fallback mechanisms to alternative models or providers when a contract break or provider outage is detected, reducing the duration and severity of customer impact. 2. **Improved diagnostic logging:** We are updating our systems to capture and retain raw provider responses for failed requests. This will enable faster forensic analysis during future incidents and eliminate the diagnostic gap identified during this investigation. 3. **Expanded monitoring**: We are updating our suite of alerts to cover this scenario. This will significantly reduce detection time for similar incidents. These measures build on lessons learned from this and similar past events. We are committed to delivering reliable, resilient automation services and will continue to strengthen our systems to ensure that customers experience minimal disruption, even when external dependencies change unexpectedly.

Read the full incident report →

Major April 16, 2026

Multiple Regions - IXP - Incorrect Validation Error Message

Detected by Pingoru
Apr 16, 2026, 05:26 PM UTC
Resolved
Apr 16, 2026, 07:16 PM UTC
Duration
1h 50m
Affected: IXPIXPIXPIXPIXPIXPIXPIXPIXPIXPIXP
Timeline · 5 updates
  1. investigating Apr 16, 2026, 05:26 PM UTC

    We're currently seeing an issue where one IXP endpoint is displaying an incorrect validation error message for a small subset of older IXP projects following a recent change. This message is not accurate and can be safely disregarded, though it may temporarily impact parts of the IXP UI. Importantly, IXP ingestion is not affected, and no action is required from customers at this time. Our team is actively working on a hotfix, which will be deployed as soon as possible. We'll share an update once the fix is in place.

  2. identified Apr 16, 2026, 06:17 PM UTC

    A hotfix is currently being rolled out to address the incorrect validation error message affecting a small subset of older IXP projects. During this time, some users may continue to briefly see the incorrect message in the IXP UI. As a reminder, this does not impact IXP ingestion, and no customer action is required. We will confirm once the rollout is complete.

  3. monitoring Apr 16, 2026, 06:51 PM UTC

    A hotfix has now been successfully deployed to address the incorrect validation error message affecting a small subset of older IXP projects. We are actively monitoring to ensure stability.

  4. resolved Apr 16, 2026, 07:16 PM UTC

    A hotfix has now been successfully deployed to address the incorrect validation error message affecting a small subset of older IXP projects. The issue is resolved, and we are continuing to monitor.

  5. postmortem Apr 21, 2026, 07:33 PM UTC

    ## Customer Impact Between April 16, 2026 at 4:30 pm UTC and April 16, 2026 at 7:22 pm UTC, a subset of customers encountered incorrect validation error messages when accessing older projects in the IXP Communications Mining web application. Specifically, users saw erroneous error messages that did not reflect actual data issues. These messages may have caused confusion or disrupted normal workflows. Although the underlying change had been deployed across all regions, the observable impact was confined to a very small number of legacy datasets in IXP Communications Mining. Data processing and ingestion continued to function normally throughout the incident, and no customer action is required. ## Root Cause A recently deployed change introduced an error in the validation logic for one API endpoint. This change caused the system to display incorrect validation error messages, which surfaced as server errors. The root cause was a misconfiguration in the endpoint's validation rules that did not properly account for legacy project formats. Requests involving these older projects triggered false error messages in the interface. The underlying data processing and ingestion functionality remained unaffected; the issue was isolated to the error-display layer. Recovery was achieved by quickly developing and deploying a targeted hotfix to restore correct validation behavior for affected projects. ## Detection Anomalous error logging was identified at around 4:30 UTC, and reported by a customer around the same time, at which point engineers began investigating. By 5:21 pm UTC, the team confirmed that the errors were limited to legacy datasets and were manifesting as incorrect validation messages in the interface. ## Response Around 5:21 pm UTC, the team identified the issue as a validation logic misconfiguration affecting older projects, confirmed that data ingestion was unaffected, and updated the status page. The hotfix was built and rolled out globally. Full resolution was confirmed at 7:18 pm UTC, when the status page was updated to Resolved. ## Follow-up **Short-term** * Enhance testing to automatically test for behaviour of legacy IXP projects * Enforce consistent serialisation logic across codebase in standard pattern to avoid unexpected edge cases, with focus on testing we can serialise legacy data models that predate later validation requirements

Read the full incident report →

Critical April 15, 2026

Multiple services degraded due to auth failures in Europe region

Detected by Pingoru
Apr 15, 2026, 09:24 AM UTC
Resolved
Apr 15, 2026, 09:48 AM UTC
Duration
23m
Affected: Automation CloudOrchestratorAutomation HubAI CenterAction CenterAppsAutomation OpsComputer VisionCustomer PortalData ServiceDocumentation PortalDocument UnderstandingInsightsIntegration ServiceMarketplaceProcess MiningTask MiningTest ManagerIXPServerless RobotsStudio WebSolutions ManagementContext GroundingAutopilot for EveryoneAutopilot (Plugins)AgentsAgentic OrchestrationAutopilot for DevelopersScreenPlayCloud Robots - VM
Timeline · 4 updates
  1. investigating Apr 15, 2026, 09:24 AM UTC

    We are currently investigating auth issues in the Europe region causing failures across multiple services .Our engineering team has identified the root cause and are in process of restoring full functionality.

  2. investigating Apr 15, 2026, 09:30 AM UTC

    We are currently investigating auth issues in the Europe region causing failures across multiple services .Our engineering team has identified the root cause and are in process of restoring full functionality.

  3. monitoring Apr 15, 2026, 09:39 AM UTC

    Our engineering team has implemented a fix and services are currently stabilising . We are monitoring the system to ensure stability and full recovery. Further updates will be shared soon

  4. resolved Apr 15, 2026, 09:48 AM UTC

    The issue has been resolved. The system has remained stable during the monitoring period.

Read the full incident report →

Major April 15, 2026

Europe - Integration Service - Degraded Performance

Detected by Pingoru
Apr 15, 2026, 07:37 AM UTC
Resolved
Apr 15, 2026, 07:48 AM UTC
Duration
10m
Affected: Integration Service
Timeline · 3 updates
  1. investigating Apr 15, 2026, 07:37 AM UTC

    We are currently investigating an issue where customers in the Europe region are unable to create new Integration Service Connections. Existing connections are not impacted. Our engineering team is actively working to identify the root cause and restore full functionality.

  2. resolved Apr 15, 2026, 07:48 AM UTC

    The issue has been resolved. The system is stable.

  3. postmortem Apr 29, 2026, 08:49 AM UTC

    ## Customer Impact Between April 15, 2026 at 7:00 am UTC and April 15, 2026 at 7:50 am UTC, customers in the Europe region were unable to create new connections using the Integration Service. Existing connections and all other regions remained fully operational throughout the incident. Customers who attempted to create new connections received error messages. A public status page update was published at 7:40 am UTC to keep affected customers informed. ### Scope Only customers in the Europe region who attempted to create new Integration Service connections during this window were affected. All other regions and existing connections continued to function normally. ## Root Cause A recent backend configuration update to the Integration Service in the Europe region required corresponding database schema updates to be applied as part of the same deployment. Due to a conditional check in the deployment pipeline—introduced by an earlier, unrelated change—the schema migration step was skipped during rollout. As a result, the service ran against a schema that did not match the expected configuration, causing new connection creation requests to fail with internal errors. Existing connections were unaffected because they did not rely on the updated schema paths. Once the team identified the mismatch, the deployment was rolled back, which restored the service to its previous working state and normal operation resumed. Analysis of service logs and error patterns confirmed the root cause, with failures aligning precisely to the timing and scope of the deployment. ## Detection Automated monitoring detected the incident at 7:24 am UTC when error rates for new connection attempts in the Europe region exceeded normal thresholds. The alert was acknowledged within one minute, and incident response procedures began immediately. By 7:25 am UTC, the responsible team had assembled and initiated their investigation. The interval between the onset of customer impact and detection was under a minute, enabling a rapid response. ## Response At 7:05 am UTC, engineers received the automated alert and joined the incident response call to investigate error logs and recent deployments to the Integration Service. By 7:15 am UTC, the team had scoped the issue to new connection attempts in the Europe region and identified the recent deployment as the likely cause. At 7:33 am UTC, the incident was formally classified as customer-impacting. At 7:40 am UTC, a public status page update was published. By 7:48 am UTC, the team had confirmed the root cause, rolled back the deployment, and verified that new connection creation had returned to normal. Monitoring confirmed full recovery by 7:51 am UTC, and the incident was declared resolved. ## Follow-up To prevent similar incidents, we are implementing the following improvements: * **Deployment pipeline fix:** Correcting the pipeline condition that caused the database schema migration step to be skipped, and adding guardrails to ensure required schema updates are always applied alongside the corresponding service changes. * **Pre-deployment validation:** Adding automated pre-deployment checks that verify service and schema compatibility, so mismatches are caught before any customer-facing rollout. * **Canary traffic coverage:** We are adding automated test calls that exercise the new connection API against canary instances on every deployment, so schema or configuration mismatches are caught in the canary phase regardless of live traffic levels. * **Enhanced connection monitoring:** Expanding monitoring to include targeted alerts for failed new-connection attempts, enabling even faster detection of issues affecting new connections. We are also reviewing recent changes to our deployment and validation processes to identify additional safeguards. We sincerely appreciate your patience and understanding as we continue to work to make our services more resilient and dependable.

Read the full incident report →

Notice April 14, 2026

Multi-Region Failures on start jobs relying on Serverless runtimes

Detected by Pingoru
Apr 14, 2026, 11:22 AM UTC
Resolved
Apr 14, 2026, 11:22 AM UTC
Duration
Timeline · 2 updates
  1. resolved Apr 14, 2026, 11:22 AM UTC

    We observed failures in starting jobs that rely on serverless runtimes across multiple regions. As a result, job executions on serverless runtimes were impacted. The fix has been implemented and is stable.

  2. postmortem Apr 16, 2026, 12:52 PM UTC

    ## **Customer Impact** Between April 14, 2026 at 10:02 am UTC and April 14, 2026 at 11:05 am UTC, a significant number of customers were unable to start jobs that relied on Serverless runtimes. During this period, attempts to initiate Serverless jobs—including scheduled executions, debug sessions, and automation app executions—failed with errors, resulting in automation workflows not running as expected. Customers may have seen error messages \(such as conflict errors on job start requests\) or found debug functionality unavailable within their automation environments. This incident affected customers across regions globally using Serverless automation features in our cloud platform. The primary disruption lasted approximately one hour and three minutes. Some customers may have experienced brief residual effects on debug functionality in our web-based design tools shortly after the main issue was resolved. ## **Root Cause** The incident was triggered by an erroneous update to a configuration setting that controls the availability of Serverless runtimes. The intent behind the change was to disable Serverless and related compute functionality in a specific environment where those services are not deployed. However, an incorrect configuration that contained a negation clause was used. Because this negation clause was not removed, the resulting configuration inverted the intended logic—disabling Serverless runtimes across nearly all regions instead of only in the targeted environment. As a result, all requests to start jobs that depended on Serverless runtimes were rejected by the platform, returning error responses. This affected serverless workloads, debug jobs, standard automation jobs, and app executions. The misconfiguration propagated rapidly across all affected regions, as our platform services automatically refresh their configuration settings at frequent intervals. Once the error was identified through a review of the recent configuration change, the negation clause was removed and the corrected configuration was redeployed. Our platform services picked up the fix within their regular refresh cycle, restoring Serverless job execution capability across all regions. Full recovery was confirmed shortly after the corrected configuration was deployed. ## **Detection** Initial automated alerts were generated at 10:06 UTC. However, the correlation of these alerts to a customer-impacting issue was delayed due to a high volume of concurrent alert activity. The engineering team confirmed the impact and began an active investigation at 10:50 UTC. ## **Response** Upon engagement at 10:50 am UTC, engineers began investigating the root cause. By 10:54 am UTC, the team had identified a recent configuration change as a likely cause. The configuration was reviewed on an incident response call, and the erroneous negation clause was confirmed as the source of the problem—it had inverted the intended logic, disabling Serverless runtimes globally rather than in a single targeted environment. At 10:56 am UTC, a corrective update to the configuration was prepared. By approximately 10:59 am UTC, the fix was deployed. Our platform services, which automatically refresh configuration settings every ten seconds, began picking up the corrected setting shortly thereafter. By approximately 11:03 am UTC, the team observed successful job start responses from the platform, confirming that mitigation was taking effect. By 11:05 am UTC, monitoring confirmed that Serverless job execution had been fully restored across all regions, and customers were once again able to start jobs as expected. Throughout the response, the team verified recovery through both service metrics and direct testing of job execution. A status page update and impact summary were also prepared for affected customers. ## **Follow-up** To prevent similar incidents in the future, we are implementing several targeted improvements: 1. **Remove capability to disable service via Feature Flag**: We are removing the possibility to disable a service by a simple feature flag and will rely on a static service configuration that follows our Secure Deployment Principles through ringed rollouts. 2. **Stronger review and governance processes**: We are improving documentation, peer review requirements, and governance controls for configuration changes that affect service availability, ensuring that changes with broad impact receive appropriate scrutiny before deployment. 3. **Faster detection through improved monitoring**: We are enhancing monitoring and alerting to detect abnormal drops in job execution success rates within minutes, reducing detection time for service-impacting issues. This includes reviewing how existing health checks interact with traffic-based alerting to eliminate blind spots that delayed detection in this incident. We understand how disruptive this incident was to your automation workflows, and we sincerely apologize for the impact. These improvements are already underway, and we are committed to learning from this event. We will continue to invest in the reliability and resilience of our platform to better support your business.

Read the full incident report →

Notice April 13, 2026

IXP - All Regions

Detected by Pingoru
Apr 13, 2026, 06:31 PM UTC
Resolved
Apr 10, 2026, 12:30 PM UTC
Duration
Timeline · 1 update
  1. resolved Apr 13, 2026, 06:31 PM UTC

    At approximately 12:30 UTC on April 10th, a change was deployed that caused a subset of EWS integrations to gradually degrade over time. The impact was not immediate and varied across integrations. At around 17:00 UTC, we detected degraded Exchange performance and began investigating. A hotfix was developed and deployed over the following hours. Full service was restored by approximately 20:30 UTC on April 10th.

Read the full incident report →

Looking to track UiPath downtime and outages?

Pingoru polls UiPath's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

  • Real-time alerts when UiPath reports an incident
  • Email, Slack, Discord, Microsoft Teams, and webhook notifications
  • Track UiPath alongside 5,000+ providers in one dashboard
  • Component-level filtering
  • Notification groups + maintenance calendar
Start monitoring UiPath for free

5 free monitors · No credit card required