- Detected by Pingoru
- Apr 13, 2026, 06:09 PM UTC
- Resolved
- Apr 13, 2026, 11:23 PM UTC
- Duration
- 5h 13m
Affected: Insights
Timeline · 6 updates
-
investigating Apr 13, 2026, 06:09 PM UTC
We are investigating reports of delayed data impacting orchestration tables for Insights in the United States. Our teams are actively working to identify the root cause and will share updates as more information becomes available.
-
investigating Apr 13, 2026, 07:16 PM UTC
We are continuing to investigate reports of delayed data impacting orchestration tables for Insights in the United States. Our teams remain actively engaged in identifying the root cause. We will provide further updates as more information becomes available.
-
identified Apr 13, 2026, 08:16 PM UTC
We have identified the issue and are currently rolling out a fix. Our teams remain actively engaged, and we will provide further updates as more information becomes available.
-
identified Apr 13, 2026, 09:24 PM UTC
We have identified the issue and are currently rolling out a fix. Our teams remain actively engaged, and we will provide further updates as more information becomes available.
-
identified Apr 13, 2026, 10:36 PM UTC
We have identified the issue and are continuing to roll out a fix. Our teams remain actively engaged, and we will provide further updates as more information becomes available.
-
resolved Apr 13, 2026, 11:23 PM UTC
The issue has been identified and resolved. Data delays impacting orchestration tables for Insights in the United States have been addressed. Our teams will continue to monitor to ensure stability. Thank you for your patience.
Read the full incident report →
- Detected by Pingoru
- Apr 09, 2026, 08:12 AM UTC
- Resolved
- Apr 09, 2026, 09:15 AM UTC
- Duration
- 1h 2m
Affected: MarketplaceMarketplaceMarketplaceMarketplaceMarketplaceMarketplaceMarketplaceMarketplaceMarketplaceMarketplaceMarketplace
Timeline · 3 updates
-
investigating Apr 09, 2026, 08:12 AM UTC
We are currently investigating this issue.
-
monitoring Apr 09, 2026, 08:53 AM UTC
Services have recovered and are currently stable. We are continuing to monitor while investigating the root cause and implementing a long-term fix. Thank you for your patience.
-
resolved Apr 09, 2026, 09:15 AM UTC
This incident has been resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 08, 2026, 08:36 AM UTC
- Resolved
- Apr 08, 2026, 10:49 AM UTC
- Duration
- 2h 12m
Affected: Studio WebStudio WebStudio WebStudio WebStudio WebStudio WebStudio Web
Timeline · 4 updates
-
investigating Apr 08, 2026, 08:36 AM UTC
We are investigating the issue.
-
identified Apr 08, 2026, 09:32 AM UTC
We have identified the issue, and a fix is currently being implemented.
-
monitoring Apr 08, 2026, 10:15 AM UTC
A fix has been deployed and we are actively monitoring to confirm the issue is fully resolved.
-
resolved Apr 08, 2026, 10:49 AM UTC
Monitoring has confirmed stable behavior and the issue is now fully resolved.
Read the full incident report →
- Detected by Pingoru
- Apr 04, 2026, 01:26 AM UTC
- Resolved
- Apr 04, 2026, 01:26 AM UTC
- Duration
- —
Timeline · 1 update
-
resolved Apr 04, 2026, 01:26 AM UTC
Between 12:00 AM and 12:45 AM UTC on April 4, 2026, a database outage in north Europe region caused an interruption to trace and audit ingestion. Data ingestion might have impacted for services using LLMOps: Telemetry trace ingestion might have failed LLM inputs and outputs were not recorded in audit logs during the outage. Currently service has been restored and healthy. No Ingestion issues observed.
Read the full incident report →
- Detected by Pingoru
- Apr 02, 2026, 09:47 PM UTC
- Resolved
- Apr 03, 2026, 01:23 AM UTC
- Duration
- 3h 36m
Affected: Action CenterAction CenterAction CenterAction CenterAction CenterAction CenterAction CenterAction Center
Timeline · 4 updates
-
investigating Apr 02, 2026, 09:47 PM UTC
We are investigating customer reports of errors with Action Center
-
identified Apr 02, 2026, 10:34 PM UTC
Our team has identified the issue and is rolling out a fix.
-
identified Apr 02, 2026, 11:53 PM UTC
we are still working to ensure that fix is being rolled out to all regions while also testing the fix to see if its working as expected.
-
resolved Apr 03, 2026, 01:23 AM UTC
On further investigation figured out the issue was related to automation ops outage earlier and that issue is now resolved. Singing out and Sign in again should work for customers.
Read the full incident report →
- Detected by Pingoru
- Apr 02, 2026, 07:11 PM UTC
- Resolved
- Apr 02, 2026, 08:33 PM UTC
- Duration
- 1h 21m
Affected: Automation Ops
Timeline · 4 updates
-
investigating Apr 02, 2026, 07:11 PM UTC
AO-Governance in EUS region is experiencing elevated error rates. Our team is currently investigating
-
monitoring Apr 02, 2026, 07:53 PM UTC
From approximately 11:00 AM to 12:30 PM PST, the governance service's East US experienced resource exhaustion, UiPath generative AI services may have encountered errors during this time. Our team has scaled resources as a mitigation and investigation is ongoing.
-
resolved Apr 02, 2026, 08:33 PM UTC
This incident is mitigated.
-
postmortem Apr 09, 2026, 06:20 PM UTC
## Customer Impact Between April 2, 2026, 18:01 UTC and 19:19 UTC \(approximately 78 minutes\), customers in the US region may have experienced intermittent errors and increased latency when using AI and automation services dependent on governance policy evaluations. ## Root Cause At 18:00 UTC, a scheduled batch job initiated policy evaluations across multiple organizations, generating three times the normal request volume. This sudden surge placed heavy pressure on the governance database. Consequently, services waiting for database responses timed out and automatically retried their requests. Because the original queries were still consuming database resources \(CPU and I/O\) in the background, these immediate retries created an amplification loop that ultimately led to database resource exhaustion. ## Detection Automated monitoring detected the increased error rates within the governance service and immediately alerted our engineering team, who began investigating right away. ## Response Telemetry analysis confirmed database resource exhaustion as the root cause. To mitigate the issue and break the retry amplification loop, the engineering team immediately doubled the database's processing capacity. Following this action, error rates quickly subsided, and the governance service resumed normal processing. ## Follow-Up To prevent this issue from recurring, our engineering teams are prioritizing the following actions: * **Implement Caching:** Introduce caching mechanisms for repeated governance policy lookups to significantly reduce database load during high-volume evaluations. * **Upgrade Infrastructure:** Migrate the database to an elastically scalable model to dynamically handle sudden spikes in traffic, replacing fixed capacity ceilings. * **Enhance System Resilience:** Implement intelligent retry backoff intervals and "circuit breakers" on calling services to prevent future retry amplification loops.
Read the full incident report →
- Detected by Pingoru
- Apr 02, 2026, 07:07 PM UTC
- Resolved
- Apr 02, 2026, 07:42 PM UTC
- Duration
- 35m
Affected: AgentsAgentsAgents
Timeline · 4 updates
-
investigating Apr 02, 2026, 07:07 PM UTC
Some customers will start seeing their conversations degraded in US EU and Singapore regions. We are currently investigating the issue.
-
investigating Apr 02, 2026, 07:08 PM UTC
Some customers will start seeing their conversations degraded in US EU and Singapore regions. We are currently investigating the issue.
-
monitoring Apr 02, 2026, 07:41 PM UTC
We have identified the issue to be coming from downstream services. we have mitigated the issue with downstream service and currently monitoring.
-
resolved Apr 02, 2026, 07:42 PM UTC
Issue is now resolved.
Read the full incident report →
- Detected by Pingoru
- Mar 31, 2026, 08:42 AM UTC
- Resolved
- Mar 31, 2026, 10:09 AM UTC
- Duration
- 1h 27m
Affected: Data ServiceData ServiceData ServiceData ServiceData ServiceData ServiceData ServiceData ServiceData Service
Timeline · 4 updates
-
investigating Mar 31, 2026, 08:42 AM UTC
We are currently experiencing an outage affecting UiPath Data Service across multiple regions. Our engineering team is actively investigating the root cause and working to restore full functionality as quickly as possible.
-
monitoring Mar 31, 2026, 09:03 AM UTC
Team has rolled out the fix and currently monitoring the service health.
-
resolved Mar 31, 2026, 10:09 AM UTC
The issue is fixed and service is healthy.
-
postmortem Apr 02, 2026, 08:11 AM UTC
# Customer Impact On March 31st 2026, 4:50 AM to 8:50 AM UTC, a small subset of customers experienced HTTP 500 errors when using the Data Service Entity Query API in multiple regions. The issue occurred only for requests containing a user-defined filter configuration when the API is used directly, distinguishing it from filters generated by platform consumption surfaces \(e.g., Activities\). For affected customers, queries with this specific payload request failed, while all other requests continued to function normally. No data loss or data inconsistency occurred. # Root Cause The issue was triggered for a subset of requests with a specific payload pattern \("filterGroups": \[null\]\) that was valid but not handled correctly by the system. A recently introduced logging component did not properly handle this input. As a result, it failed during processing and caused the API to return an HTTP 500 error. This exposed a gap in how the system handles certain uncommon but valid inputs. Additionally, because this logging runs as part of the request flow, its failure impacted the overall request instead of being isolated. # Detection The issue was initially identified through an automated Sev2 alert and was also reported by a customer. During the investigation, the engineering team identified a common payload pattern across failing requests and traced the issue to the query analytics telemetry path. # Response Upon detection, the team identified the affected execution path and applied mitigation by disabling the related functionality via a configuration change. This action immediately stopped the errors and restored normal API behavior. The issue was mitigated within 30 minutes. # Follow-Up The following actions are being taken to prevent a recurrence: * Add safeguards to ensure logging code paths do not affect request execution * Aligning and tightening the Document to have the right payloads; then add API-level validation to reject semantically invalid filter structures with 4xx responses. * Add a test case to validate this exact failing scenario is now handled correctly
Read the full incident report →
- Detected by Pingoru
- Mar 30, 2026, 10:12 AM UTC
- Resolved
- Mar 30, 2026, 06:26 PM UTC
- Duration
- 8h 13m
Affected: AgentsAgentsAgents
Timeline · 5 updates
-
investigating Mar 30, 2026, 10:12 AM UTC
Uipath Agents that are newly deployed in Canada, Japan and USA will not start unless a serverless machine template is assigned to the folder. We are working on a fix, but in the meanwhile customers can manually mitigate this issue by adding a serverless machine template to the folder.
-
identified Mar 30, 2026, 11:36 AM UTC
Team is working on a fix deployment, but in the meanwhile customers can manually mitigate this issue by adding a serverless machine template to the folder.
-
identified Mar 30, 2026, 01:34 PM UTC
Team is making progress on a fix deployment, but in the meanwhile customers can manually mitigate this issue by adding a serverless machine template to the folder.
-
resolved Mar 30, 2026, 06:26 PM UTC
Issue is now resolved.
-
postmortem Apr 01, 2026, 03:49 PM UTC
# Customer impact We understand that this incident disrupted your ability to deploy new agents, and we apologize for the impact on your operations. Between March 25, 2026 at 6:21 am UTC and March 30, 2026 at 6:00 pm UTC—a period of approximately five days—customers in the EU, US, and Asia regions were unable to deploy new agents in certain folders following a transition to the Serverless Machine Runtime. Existing agents continued to operate normally throughout the incident. If you attempted to deploy a new agent in a folder without a Serverless Machine Runtime provisioned, you encountered errors and were unable to proceed. A workaround was available throughout: you could manually provision a Serverless Machine Runtime at the folder level to immediately unblock new deployments. For organizations where enabling the Serverless Machine Runtime was not viable, our engineering team restored the previous runtime configuration to resume your operations. ## Root cause A recent change transitioned newly deployed agents to run on the Serverless Machine Runtime instead of the previous runtime. This introduced a requirement for a Serverless Machine Runtime to be provisioned at the folder level before new agents could execute. This requirement was not intended. The system already contained fallback logic for other workloads—such as API Workflows—that would automatically locate or provision a Serverless Machine Runtime when one was not explicitly assigned at the folder level. However, this fallback logic was not applied to agents during the transition, creating a gap in the expected behavior. As a result, any folder without an explicitly provisioned Serverless Machine Runtime could not run newly deployed agents. Existing agents and folders with templates already assigned remained unaffected because they continued to use their previously configured runtime. ## Detection The issue was first reported by a customer on March 25, 2026 at 6:21 am UTC. Our engineering team was engaged the following day and identified the root cause, approximately one day after the initial report. We recognize the gap between the issue first surfacing on March 25 and the formal incident declaration approximately five days later on March 30, 2026 at approximately 10:00 am UTC as an area for improvement. While the workaround was communicated during this period, we should have notified you through our status page sooner to ensure you had the information needed to make decisions about your environment. ## Response Following the initial reports on March 25, our engineering team provided workarounds to affected customers. For those of you who were unable to provision Serverless Machine Runtimes, our engineering team restored the previous runtime configuration to resume your operations. Our engineering team identified the root cause and developed a fix on March 26, 2026, then validated it in a testing environment by March 27, 2026. On March 30, 2026 at approximately 10:00 am UTC, we declared a formal incident to coordinate the deployment of the fix across all affected regions. The fix ensures that if no Serverless Machine Runtime is found at the folder level, one is automatically selected from the tenant level or provisioned if none exists. We deployed the fix progressively, validated it at each stage, and provided status updates throughout the day. We achieved full recovery by March 30, 2026 at 6:00 pm UTC, when new agent deployments resumed normal operation across all regions without manual intervention. ## Follow up To prevent similar issues in the future, we are expanding our regression testing to validate that new releases properly account for all configuration, deployment, and licensing scenarios—including cases where optional infrastructure components have not been explicitly provisioned at the folder level. We are adding automated test cases that specifically verify fallback behavior across all workload types whenever runtime requirements change, ensuring that agents, API Workflows, and other workloads are validated consistently before any release reaches your environment. We are also enhancing our monitoring and alerting systems to detect deployment failures of this nature proactively, without relying on customer reports. This includes automated checks that verify new agent deployments succeed across a representative sample of folder configurations immediately after each release, allowing us to identify and resolve issues before they affect you. Finally, we are reviewing our escalation and communication processes to ensure that when issues are identified, you are notified promptly through our status page. Closing the gap between issue detection and customer communication is a priority so that you have the information you need to take action as quickly as possible. These improvements directly address the gaps this incident exposed—in testing coverage, proactive detection, and customer communication—and will help ensure that future runtime transitions do not disrupt your agent deployments.
Read the full incident report →
- Detected by Pingoru
- Mar 25, 2026, 06:27 PM UTC
- Resolved
- Mar 26, 2026, 03:45 AM UTC
- Duration
- 9h 17m
Affected: AgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgents
Timeline · 5 updates
-
investigating Mar 25, 2026, 06:27 PM UTC
Agent Builder might not load for some customers. Impact is seen in multiple regions. The current workaround is to disable cache in the browser and refresh the page, while we are investigating the root cause.
-
identified Mar 25, 2026, 06:41 PM UTC
We are actively working on a fix to ensure the cache is properly invalidated upon deployment. We'll share another update once the fix is ready to ship.
-
identified Mar 25, 2026, 08:07 PM UTC
We are testing the fix and will share another update once the fix is ready to ship.
-
resolved Mar 26, 2026, 03:45 AM UTC
The issue has been resolved.
-
postmortem Mar 27, 2026, 08:20 PM UTC
## Customer Impact Following a deployment, some customers experienced issues where the **Agent Builder frontend failed to load**. * **8 organizations** encountered the issue multiple times * **65 organizations** experienced the issue exactly once The issue was self-resolving upon clearing browser cache and refreshing the page. No further occurrences were observed after mitigation, and the status page was updated and resolved. ## Root Cause The incident was caused by **browser caching of a stale remote entry file** \(remoteEntry.js\) following a deployment. After the deployment, the remote entry file was updated on the server, but browsers continued to serve the previously cached version. The stale cached file referenced assets that no longer existed on the server, resulting in **404 errors** when the browser attempted to load them, causing the Agent Builder frontend to fail to load. The cache was not properly invalidated upon deployment, meaning the problem only appeared immediately after a deployment and resolved itself as browser caches expired. ## Detection The issue was **reported by a customer**. The failed requests returned 404 status codes, which **did not trigger any existing alerts** on our side. This represents a monitoring gap: 404 errors on frontend asset requests were not covered by our alerting rules, meaning the issue went undetected until a customer reported it. ## Response Upon detection: 1. **The engineering team identified the root cause** as stale browser caching of the remoteEntry.js file after the deployment. 2. **Immediate mitigation** was communicated: affected customers could resolve the issue by disabling cache and refreshing the page. 3. **Deployments were paused** to prevent further occurrences until a fix was merged into the release branch. 4. **The status page was updated and resolved** once the root cause was confirmed and mitigation verified. ## Follow-Up To prevent similar incidents in the future, we are taking the following steps: * **Deployment process improvement**: Ensuring that future deployments properly invalidate cached frontend assets to prevent stale file references.
Read the full incident report →
- Detected by Pingoru
- Mar 25, 2026, 05:23 PM UTC
- Resolved
- Mar 25, 2026, 06:11 PM UTC
- Duration
- 47m
Affected: Integration ServiceAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgents
Timeline · 4 updates
-
investigating Mar 25, 2026, 05:23 PM UTC
We are currently investigating an issue affecting the creation of Integration Service connections. Further updates will be provided shortly.
-
investigating Mar 25, 2026, 05:33 PM UTC
Users are unable to create Integration Service connections in US region
-
monitoring Mar 25, 2026, 05:59 PM UTC
Identified the issue and rolled a fix for resolution. we are actively monitoring.
-
resolved Mar 25, 2026, 06:11 PM UTC
The issue has been successfully resolved, and the service is now fully operationa
Read the full incident report →
- Detected by Pingoru
- Mar 23, 2026, 03:27 PM UTC
- Resolved
- Mar 23, 2026, 11:26 PM UTC
- Duration
- 7h 58m
Affected: Cloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VMCloud Robots - VM
Timeline · 8 updates
-
investigating Mar 23, 2026, 03:27 PM UTC
We are investigating the incident.
-
identified Mar 23, 2026, 04:23 PM UTC
We identified the issue and are working deploying the mitigation.
-
identified Mar 23, 2026, 05:51 PM UTC
We’ve identified the issue and are continuing to deploy mitigation. Work is ongoing, and we’re closely monitoring progress toward full resolution.
-
identified Mar 23, 2026, 07:26 PM UTC
We are actively working on a fix and taking steps to minimize the impact. We will continue to share updates as they become available.
-
monitoring Mar 23, 2026, 10:13 PM UTC
The issue has been resolved, and we are currently monitoring the system to ensure stabilit
-
monitoring Mar 23, 2026, 10:17 PM UTC
The issue has been resolved, and we are currently monitoring the system to ensure stabilit
-
resolved Mar 23, 2026, 11:26 PM UTC
The issue has been resolved.
-
postmortem Mar 26, 2026, 08:06 PM UTC
## _Customer Impact_ Between March 18, 2026 at 7:54 pm UTC and March 23, 2026 at 10:10 pm UTC, a subset of customers were unable to create or edit VPN Gateways within the UiPath Automation Cloud Administration portal. Attempts to perform these actions returned errors, preventing any configuration changes to VPN Gateways. Existing VPN Gateways continued to function normally throughout, while create and edit operations were affected. This issue impacted customers across all commercial cloud regions. Our regulated \(GXP\) environment was confirmed unaffected at any point during the incident. ## _Root Cause_ A recent update to the VPN Gateway management service introduced a change in the data structures used for creation and editing operations. There was a deployment misalignment between front-end services and back-end services, and as a result, any attempt to create or edit a VPN Gateway failed with an error, while existing VPN Gateways continued to operate normally. Recovery required deploying updated back-end services to all regions. ## _Detection_ The incident was detected on March 23, 2026 at 3:05 pm UTC, and within minutes of onset, our engineering team assembled via a dedicated incident bridge to begin an investigation. The incident was formally declared as a critical, customer-impacting event at 3:21 pm UTC, and a public impact statement was posted to our status page by 3:28 pm UTC. ## _Response_ Upon detection, our engineering team convened immediately to investigate the root cause. Initial analysis confirmed that VPN Gateway creation and editing were consistently failing across all commercial cloud regions, while existing connections remained operational. The team reviewed recent changes to the VPN Gateway management service and identified the front-end changes as the likely cause. A proposed fix was prepared by 3:09 pm UTC, and a corrective build was initiated by 4:04 pm UTC. The fix rollout began at 5:28 pm UTC. However, by 7:15 pm UTC, the team determined that this initial fix did not resolve the issue as expected. The team immediately pivoted to an alternative approach, estimating approximately one hour to build and validate the new solution. A status page update was posted at 7:19 pm UTC to inform customers that the root cause had been identified and a fix remained in progress. The alternative fix was validated in the India region at approximately 8:41 pm UTC and confirmed to be effective. Rollout to all remaining regions began at 9:02 pm UTC and proceeded in a controlled manner. By 10:10 pm UTC, the corrected version was deployed globally, and all affected VPN Gateway creation and editing functionality was fully restored. The incident was formally marked as mitigated at 10:20 pm UTC. Status page updates were posted at key milestones throughout the incident to keep customers informed, and we are taking steps to improve our communication cadence during extended incidents \(see Follow-Up below\). ## _Follow-Up_ To prevent similar incidents in the future, we are implementing several targeted improvements to address the underlying issue of unsynchronized changes between frontend and backend components: 1. We are enforcing feature flags into our release process for the VPN Gateway management service, allowing frontend and backend validation changes to be deployed in a coordinated manner and enabled only when both components are fully aligned, while also ensuring these changes are thoroughly tested against a comprehensive set of configuration scenarios before reaching production. 2. We are introducing new integration tests that ensure frontend and backend changes are validated together before release. 3. We are enhancing our monitoring to include dedicated alerts for failed configuration operations, enabling faster detection of issues that may arise from mismatches between client-side and server-side validation.
Read the full incident report →
- Detected by Pingoru
- Mar 23, 2026, 11:42 AM UTC
- Resolved
- Mar 23, 2026, 12:34 PM UTC
- Duration
- 51m
Affected: Integration Service
Timeline · 4 updates
-
investigating Mar 23, 2026, 11:42 AM UTC
New connection creation in UiPath Integration Service is currently failing in the Europe region. Existing connections and running automations are not impacted. The team is actively working to resolve the issue. We will continue to provide updates on status.
-
monitoring Mar 23, 2026, 12:00 PM UTC
A fix has been deployed and the service is recovered. Current status: We are monitoring the system to ensure stability and full recovery.
-
resolved Mar 23, 2026, 12:34 PM UTC
The issue has been resolved.
-
postmortem Apr 01, 2026, 03:40 PM UTC
## Customer impact We understand how important reliable access to your integrations is, and we sincerely apologize for the disruption this incident caused to your work. Between March 23, 2026 at 10:30 am UTC and March 23, 2026 at 11:45 am UTC \(approximately 75 minutes\), a subset of customers in the EU region were unable to create or authenticate connections through the UiPath Integration Service™. During this window, the vast majority of new connection requests failed, returning errors when you attempted to set up or authenticate a connection. Retrieving existing connections, running workflows, and all previously established connections continued to function normally throughout the incident. Only operations that required establishing a new connection were affected, including requests originating from UiPath Studio™ and Studio Web. Failures began gradually around 10:15 am UTC, escalated to a near-complete disruption by 10:30 am UTC, and persisted until approximately 11:45 am UTC when the service was restored to normal operation. ## Root cause A scheduled maintenance activity was performed on the data layer supporting the Integration Service in the US region at approximately 10:20 am UTC on March 23, 2026. During this maintenance, the data service used by the EU region’s Integration Service was temporarily disrupted. A configuration mismatch caused the EU region’s Integration Service deployment to reference a data service hosted in the US region rather than the correct regional service. When the maintenance activity disrupted that US-hosted service, all attempts to establish new connections in the EU region began failing—causing the service to return errors to users. Because the connection creation and authentication flows depend on this data service, these operations also failed. The scope of impact was limited to establishing new connections; all other Integration Service operations—including retrieving existing connections and running active workflows—remained fully functional, as they did not rely on the affected service. ## Detection Our monitoring systems detected elevated error rates on the Integration Service in the EU region at approximately 10:30 am UTC on March 23, 2026—within minutes of the disruption beginning. Our engineering team was alerted and began investigating immediately. Further analysis revealed a consistent error pattern across all failing connection requests, correlating directly with the timing of the scheduled maintenance activity in the US region. This correlation allowed our team to quickly narrow the investigation to the cross-region configuration mismatch. ## Response Our team correlated the onset of failures at 10:30 am UTC with the scheduled maintenance in the US region. Investigation confirmed that the EU region’s Integration Service was incorrectly configured to reference a US-hosted data service. Our investigation verified that every failing connection request encountered the same underlying error, confirming the cross-region configuration mismatch as the root cause. The configuration for the EU region was corrected to reference the appropriate regional service. Service recovery was observed at approximately 11:45 am UTC, with new connection creation returning to normal operation. We confirmed that only new connection creation—and the dependent authentication flows—was affected throughout the incident. ## Follow-up We are introducing automated pre-deployment checks to validate that each regional deployment references the correct region-specific data service. These checks will run before every deployment, preventing cross-region configuration mismatches from reaching production. In parallel, we are updating our maintenance procedures to include a mandatory step that identifies and validates all services with dependencies on the target data service before initiating any maintenance activity. To improve our detection speed, we are adding dedicated alerts for new connection failure rates per region, with thresholds that trigger immediate notification when the error rate exceeds 5% over a five-minute window. These alerts will ensure that any similar disruption is surfaced to our engineering team within minutes. Finally, we are conducting a comprehensive audit of all production region configurations across Integration Service deployments to identify and remediate any remaining cross-region inconsistencies. We are committed to ensuring that your experience with the Integration Service remains reliable and that the safeguards we are putting in place prevent a recurrence of this type of incident.
Read the full incident report →
- Detected by Pingoru
- Mar 20, 2026, 07:28 AM UTC
- Resolved
- Mar 20, 2026, 07:57 AM UTC
- Duration
- 28m
Affected: Integration ServiceIntegration ServiceIntegration Service
Timeline · 4 updates
-
investigating Mar 20, 2026, 07:28 AM UTC
We are currently investigating reports of an issue in Integration Service in Singapore region. Impact: Integration Service existing connections are not visible in Orchestrator, and attempts to create a new connection result in no response Our teams are actively working to identify the cause and assess the scope of the issue. Further updates will be shared as soon as more information becomes available.
-
investigating Mar 20, 2026, 07:29 AM UTC
We are currently investigating reports of an issue in Integration Service in multiple regions. Impact: Integration Service existing connections are not visible in Orchestrator, and attempts to create a new connection result in no response Our teams are actively working to identify the cause and assess the scope of the issue. Further updates will be shared as soon as more information becomes available.
-
monitoring Mar 20, 2026, 07:42 AM UTC
A fix has been deployed and the service is recovering. Current status: We are monitoring the system to ensure stability and full recovery. Further updates will be shared soon
-
resolved Mar 20, 2026, 07:57 AM UTC
The issue has been resolved. The system has remained stable during the monitoring period.
Read the full incident report →
- Detected by Pingoru
- Mar 18, 2026, 07:52 PM UTC
- Resolved
- Mar 18, 2026, 10:34 PM UTC
- Duration
- 2h 42m
Affected: AgentsAgentsAgentsAgentsAgentsAgentsAgents
Timeline · 4 updates
-
identified Mar 18, 2026, 07:52 PM UTC
We have identified the cause of the disruption impacting Prepare Python environment step for Agents in Multiple Regions and are actively implementing a fix. Impact: Coded agents failed at the Prepare Python environment step when requires-python>=3.12 was set in pyproject.toml. Workaround: Remove or lower the requires-python constraint in pyproject.toml to >=3.11 until the fix is deployed. Next update: Additional updates will be provided as remediation continues.
-
identified Mar 18, 2026, 08:46 PM UTC
We are actively working on implementing a fix. Impact: Coded agents failed at the Prepare Python environment step when requires-python>=3.12 was set in pyproject.toml. Workaround: Remove or lower the requires-python constraint in pyproject.toml to >=3.11 until the fix is deployed. Next update: Additional updates will be provided as remediation continues.
-
resolved Mar 18, 2026, 10:34 PM UTC
We are actively working on implementing a fix. Impact: Coded agents failed at the Prepare Python environment step when requires-python>=3.12 was set in pyproject.toml. Workaround: Remove or lower the requires-python constraint in pyproject.toml to >=3.11 until the fix is deployed. Next update: Additional updates will be provided as remediation continues.
-
postmortem Apr 17, 2026, 04:53 PM UTC
## **Customer Impact** Between March 16, 2026 12:30 PM and March 18, 2026 at 9:55 pm UTC, a subset of customers experienced failures when running coded agents. Agent execution failed during the "Prepare Python environment" step when the agent's configuration specified a minimum Python version of 3.12 or higher. Customers using UiPath Agents or coded agents configured for Python 3.11 were unaffected. The impact was limited to coded agents in all production regions except Europe, where the affected version had not yet been deployed. A workaround was available — lowering the Python version requirement to 3.11 in the agent's configuration — though this required repackaging and republishing the agent. ## **Root Cause** As part of a security improvement, the Python installation method in the agent runtime environment was updated. The new configuration restricted the package manager's ability to download and install Python versions on demand at runtime. The runtime environment was deployed with only Python 3.11 pre-installed, while Python 3.12 and 3.13 — which are also supported versions — were not included. As a result, coded agents specifying a minimum Python version of 3.12 or higher could not prepare their execution environment and failed to start. The issue originated from a version mismatch within a layered deployment architecture composed of multiple dependent components. During staging validation, the regression was identified and corrected in one component layer. However, during the subsequent promotion cycle, an older version of that layer was inadvertently referenced in the final deployment configuration. As a result, the production environment was deployed without the full set of supported Python versions. ## **Detection** The issue was identified on March 18, 2026 at 7:30 pm UTC after internal stakeholders observed job failures. Following incident initiation by our engineering team, the impact was quickly confirmed to be isolated to Coded Agents, specifically agents requiring Python 3.12 or higher. ## **Response** Our engineering team evaluated reverting the change but determined that doing so would negatively impact other agent types. The team chose a forward fix: rebuilding the runtime environment to include all supported Python versions \(3.11, 3.12, and 3.13\) and redeploying across affected regions. The updated environment was deployed to all affected regions by 9:55 pm UTC on March 18, 2026, and service was fully restored. The incident was formally closed at 10:37 pm UTC. ## **Follow-up** To prevent similar incidents, we are implementing the following improvements: * **Pre-deployment validation:** Automated checks in the deployment pipeline to verify that all supported Python versions are included in the runtime environment before deployment. * **Automated alerting:** Expanded monitoring to detect and alert on failures in the agent environment preparation step. * **Enhanced change review:** Configuration changes affecting agent runtime environments will require compatibility testing across all supported agent types before deployment.
Read the full incident report →
- Detected by Pingoru
- Mar 17, 2026, 08:01 AM UTC
- Resolved
- Mar 17, 2026, 10:50 AM UTC
- Duration
- 2h 48m
Affected: IXP
Timeline · 4 updates
-
investigating Mar 17, 2026, 08:01 AM UTC
We are currently investigating degraded performance affecting IXP in the US region. Affected functionality: - Automations utilizing IXP may experience increased latency - Some requests may return HTTP 503 errors Next steps: Our team is actively investigating the root cause and working to restore full service. No action is required from customers at this time.
-
investigating Mar 17, 2026, 09:47 AM UTC
Our team is making progress in identifying the root cause and is actively working diligently to mitigate the issue. No action is required from customers at this time.
-
monitoring Mar 17, 2026, 10:33 AM UTC
A fix has been deployed to address the issue. We are actively monitoring the environment to confirm full service restoration.
-
resolved Mar 17, 2026, 10:50 AM UTC
The fix has been deployed and service has been restored
Read the full incident report →
- Detected by Pingoru
- Mar 17, 2026, 05:08 AM UTC
- Resolved
- Mar 17, 2026, 01:20 PM UTC
- Duration
- 8h 11m
Affected: Customer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer PortalCustomer Portal
Timeline · 5 updates
-
investigating Mar 17, 2026, 05:08 AM UTC
We are experiencing degraded service on Customer Portal affecting multiple support-related features. Affected functionality: - Customer Support Infrastructure - Viewing existing support case details and email replies - Account/Company data loading - Self-registration case status checks - File downloads linked to support cases What's working: - Authentication and login - Non-support dependent features (Knowledge Base, Community, etc.) Root cause: An upstream dependency used by Customer Portal's support Infrastructure is currently experiencing a service disruption. Next steps: Our team is actively monitoring the situation and working to restore full service as quickly as possible. No action is required from customers at this time. We will provide updates as the situation evolves.
-
investigating Mar 17, 2026, 06:43 AM UTC
The upstream provider has identified the root cause of the disruption and is actively deploying a fix across affected instances We are seeing gradual improvement in Customer Portal functionality.
-
monitoring Mar 17, 2026, 07:51 AM UTC
System recovery is in progress and Customer Portal features are now operational We are currently monitoring system stability and conducting final verification before confirming full resolution.
-
resolved Mar 17, 2026, 01:20 PM UTC
The upstream dependency issue has been fully resolved. All Customer Portal services have been restored and are operating normally. We apologize for any inconvenience caused. Thank you for your patience.
-
postmortem Mar 30, 2026, 07:09 AM UTC
## Customer Impact **Regions Affected: Global \(all regions\)** **Services Affected:** Customer Portal - Support Cases, User/Account Team features, Knowledge Base **Timeframe:** * **Impact Start:** March 17, 2026, 9:34 am IST * **Impact End:** March 17, 2026, 11:55 am IST * **Total Duration:** ~2 hours 21 minutes **Customer Experience - Failed Operations:** * **Support Cases:** Viewing case lists/details, creating cases and follow-ups, reopening/escalating cases, sending email replies, handling attachments, and viewing dashboard stats * **User and Account Team:** Viewing UiPath contacts \(CSM, TAM, Account Owner\), updating contact flags, and accessing contact info, although cached Redis data remained temporarily available * **Knowledge Base:** Loading KB article images failed when they were not cached ## Root Cause On March 16-17, 2026, Salesforce experienced a major global service disruption affecting multiple production instances worldwide. This was a significant Salesforce outage, impacting customers globally. **Salesforce Incident:** [Trust Status](https://status.salesforce.com/incidents/20003819) Customer Portal services rely on Salesforce as the source of truth for support cases, user/contact information, and account data. As a result, the disruption to Salesforce APIs led to failures and degraded performance across all dependent functionalities. ## Detection The issue was discovered automatically through our monitoring system within 10 minutes ## Response We promptly updated our status page after confirming the upstream issue, continuously monitored recovery progress from Salesforce, and validated all dependent Customer Portal functionalities as services were restored, followed by additional checks to ensure stability. ## Follow-Up Since Salesforce was a single point of failure, there was no direct mitigation possible from our side during the outage. Our focus was therefore on timely communication, continuous monitoring, and validating end-to-end flow recovery as soon as the service was restored.
Read the full incident report →
- Detected by Pingoru
- Mar 15, 2026, 08:43 PM UTC
- Resolved
- Mar 15, 2026, 08:59 PM UTC
- Duration
- 16m
Affected: AgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgents
Timeline · 3 updates
-
investigating Mar 15, 2026, 08:43 PM UTC
We are investigating reports of an outage impacting the Web Search tool used in RPA workflows and Agents in Multiple Regions. Impact: Users may be unable to run RPA workflows or deployed/debug Agents that rely on the Web Search tool, which may cause those automations to fail. Next update: Our teams are working to understand the cause and scope and will share updates as available.
-
identified Mar 15, 2026, 08:45 PM UTC
We have identified the cause of the outage impacting the Web Search tool used in RPA workflows and Agents in Multiple Regions and are working on a fix. Impact: Users may continue to be unable to run RPA workflows or deployed/debug Agents that rely on the Web Search tool, which may cause those automations to fail. Next update: Our focus is on restoring service as quickly as possible.
-
resolved Mar 15, 2026, 08:59 PM UTC
The issue has been resolved. Our monitoring indicates that services are operating normally, and no further impact has been observed.
Read the full incident report →
- Detected by Pingoru
- Mar 13, 2026, 01:26 PM UTC
- Resolved
- Mar 13, 2026, 01:56 PM UTC
- Duration
- 30m
Affected: Studio Web
Timeline · 4 updates
-
investigating Mar 13, 2026, 01:26 PM UTC
We are currently investigating an issue impacting Studio Web in the Australia region. Further updates will be provided shortly.
-
monitoring Mar 13, 2026, 01:33 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 13, 2026, 01:56 PM UTC
The incident has been resolved, and Studio Web is now functioning normally in the Australia region.
-
postmortem Mar 18, 2026, 08:49 AM UTC
## Customer Impact On March 13, 2026, between approximately 13:01 and 13:55 UTC, customers using Studio Web in the Enterprise Australia environment experienced a service outage. Users encountered HTTP 503 errors and were unable to access Studio Web. The incident occurred at midnight local time in Australia \(AEDT\) or 1 PM UTC, which significantly limited customer exposure. ## Root Cause A routine change included a security hardening change that updated the internal port used by the service. During the staged deployment process, the updated port configuration was applied to the service routing layer before the running service instances were replaced with the new version. This created a mismatch: the routing layer directed traffic to the new port, but the existing service instances were still listening on the old port, making them unreachable. This was an edge case in our staged deployment tooling, which does not account for port changes between versions. The port change was validated in pre-production environments, but the specific interaction with the staged deployment process was not anticipated. ## Detection The issue was detected automatically by our monitoring systems soon after the deployment entered its validation phase. ## Response The team quickly identified the configuration mismatch and applied a mitigation by synchronizing the routing configuration with the running service instances. All services completed their updates by 13:32 UTC, restoring full functionality. The service was confirmed fully restored at approximately 13:55 UTC after a stability monitoring period. ## Follow-Up 1. Implement pre-deployment validations for detecting such configuration changes. 2. Investigate and improve the automatic failover mechanism for the Australia environment, which did not trigger as expected during this incident. 3. Add service endpoint validation to verify that traffic actually reaches service instances through the routing layer, not just that instances are healthy. 4. Develop a deployment strategy for configuration changes that affect service routing, ensuring they are applied atomically with the service update.
Read the full incident report →
- Detected by Pingoru
- Mar 13, 2026, 12:57 PM UTC
- Resolved
- Mar 13, 2026, 01:14 PM UTC
- Duration
- 17m
Affected: Studio Web
Timeline · 4 updates
-
investigating Mar 13, 2026, 12:57 PM UTC
We are currently investigating an issue impacting Studio Web in the US region. Further updates will be provided shortly.
-
monitoring Mar 13, 2026, 01:08 PM UTC
A fix has been implemented and we are monitoring the results.
-
resolved Mar 13, 2026, 01:14 PM UTC
This incident has been resolved.
-
postmortem Mar 18, 2026, 03:12 AM UTC
## Customer Impact On March 13, 2026, between 12:31 and 13:09 UTC, customers using Studio Web in the U.S. environment experienced a service outage lasting approximately 38 minutes. Users encountered HTTP 503 errors and were unable to access Studio Web during this period. The incident occurred at 11:31 am UTC, before the start of U.S. business hours, which limited the number of affected customers. Based on our analysis, a small number of customers experienced failed requests during the outage window. ## Root Cause A planned infrastructure change updated the network configuration of our enterprise U.S. production environment, changing the outbound IP address used by the service. The new IP address was not added to the access rules of a critical storage resource that the service depends on. When service instances restarted, they were unable to connect to this storage resource because the new IP address was blocked by the existing network access rules. The dependency between the environment's outbound IP and the storage resource's access rules was not tracked in the migration process, and there was no automated validation to verify that all dependent resources had been updated before the migration proceeded. ## Detection The issue was detected automatically by our monitoring systems within minutes of the service becoming unavailable. Our on-call engineering team was engaged immediately after being alerted. ## Response Our engineering team identified the root cause within 22 minutes of engagement by analyzing service logs and correlating them with the ongoing infrastructure migration. The issue was resolved at 13:06 UTC by updating the storage resource's network access rules to allow the new IP address. Full service restoration was confirmed at 13:09 UTC. Total time from first alert to confirmed resolution: 38 minutes. ## Follow-Up 1. Implement validation for such specific edge-cases before proceeding with further infrastructure changes. 2. Audit all storage resources and network-restricted dependencies across all regions to identify and remediate similar configuration gaps before resuming the migration. 3. Implement automated pre-migration validation that verifies all dependent resources have been updated with new network configurations. 4. Add dedicated monitoring for storage connectivity failures to enable faster detection of this specific failure mode.
Read the full incident report →
- Detected by Pingoru
- Mar 13, 2026, 11:44 AM UTC
- Resolved
- Mar 13, 2026, 11:44 AM UTC
- Duration
- —
Timeline · 2 updates
-
resolved Mar 13, 2026, 11:44 AM UTC
Between 2026-03-13 08:00 UTC and 2026-03-13 11:00 UTC, the Integration Service event triggers experienced partial degradation due to connectivity issues with the database. There was no data loss during this period. however, event delivery would have been delayed.
-
postmortem Mar 20, 2026, 04:31 PM UTC
## _Customer Impact_ Between March 13, 2026 at 11:38 am UTC and March 13, 2026 at 11:47 am UTC, a subset of customers in the Australia region may have experienced a partial degradation of event triggers. During this period, automated workflows that rely on event-based triggers may have failed to start as expected, resulting in delays or missed executions for some scheduled or real-time automation tasks. The impact was limited to customers using event-driven automation features in the Australia region. Other regions and core platform functionality remained fully operational throughout the incident. The disruption lasted for approximately nine minutes. ## _Root Cause_ As part of a routine infrastructure maintenance activity, configuration of a database was updated. While the update was successfully applied to one part of the service infrastructure, it was not propagated to a second part due to a gap in the manual deployment process. As a result, that portion of the service continued attempting to connect using an outdated configuration, causing connectivity failures that led to the degradation of Event Triggers for customers in Australia. ## _Detection_ The issue was detected automatically through UiPath internal monitoring and alerting systems, which identified anomalous behavior in the Event Triggers service and triggered alerts. A contributing factor was that the deployment process for this maintenance activity relied on manual steps, which introduced the risk of incomplete propagation across all parts of the service infrastructure. ## _Response_ The following steps were taken to investigate, mitigate, and resolve the issue: * **Detection** — Automated monitoring identified the degradation and an incident was immediately declared. * **Investigation** — The on-call engineering team began diagnosing the issue and quickly identified the root cause as a configuration mismatch introduced during the planned maintenance activity. * **Mitigation** — The affected portion of the service infrastructure was updated with the correct configuration and restarted, restoring full connectivity. * **Resolution** — The Event Triggers service was confirmed to be fully operational within approximately 9 minutes of the incident being declared. ## _Follow-Up_ UiPath is committed to preventing this class of issue from recurring. The following improvements are planned: * **Short-term:** A review of routine maintenance activities will be conducted to confirm that configuration updates have been fully and consistently applied across all components of affected services. * **Long-term — Automated deployment validation:** Configuration changes made during maintenance activities will be automatically propagated and validated across all service components, eliminating reliance on manual steps. * **Long-term — Configuration drift detection:** Monitoring will be enhanced to detect and alert on configuration mismatches across service components before they result in customer impact. * **Long-term — Improved deployment process:** Deployment procedures for maintenance activities will be updated to include explicit validation checkpoints ensuring all service components have received and acknowledged configuration updates before the activity is considered complete.
Read the full incident report →
- Detected by Pingoru
- Mar 12, 2026, 04:39 PM UTC
- Resolved
- Mar 12, 2026, 06:28 PM UTC
- Duration
- 1h 49m
Affected: Automation CloudOrchestratorAutomation HubAI CenterAction CenterAppsAutomation OpsCloud Robots - VMData ServiceDocumentation PortalDocument UnderstandingInsightsIntegration ServiceProcess MiningTask MiningTest ManagerIXPServerless RobotsSolutions ManagementCustomer PortalContext GroundingAutopilot for EveryoneAgentic OrchestrationAutopilot (Plugins)MarketplaceAgentsAutopilot for Developers
Timeline · 7 updates
-
investigating Mar 12, 2026, 04:39 PM UTC
We are investigating reports of degraded performance impacting Orchestrator in Canada. Our teams are working to identify the cause and will share more details as the investigation progresses.
-
investigating Mar 12, 2026, 05:33 PM UTC
Our teams are continuing to investigate the degraded performance impacting Orchestrator in Canada. The investigation is ongoing, and we are working to identify the root cause. We will share additional updates as more information becomes available.
-
investigating Mar 12, 2026, 05:55 PM UTC
Update: Our teams are continuing to investigate the degraded performance impacting services in Canada. The scope has expanded from Orchestrator to all services in the Canada region. The investigation is ongoing, and we are working to identify the root cause. We will share additional updates as more information becomes available.
-
monitoring Mar 12, 2026, 06:11 PM UTC
Our teams have identified the issue causing degraded performance impacting services in the Canada region and have implemented a fix. We are currently monitoring the services to ensure stability and will share additional updates if needed.
-
monitoring Mar 12, 2026, 06:18 PM UTC
Our teams have identified the issue causing degraded performance impacting customers accessing UiPath from the Canada region and have implemented a fix. We are currently monitoring the services to ensure stability and will share additional updates if needed.
-
resolved Mar 12, 2026, 06:28 PM UTC
Our teams have identified and resolved the issue that was causing degraded performance impacting customers accessing UiPath from the Canada region. A fix has been implemented, and services have recovered. We will continue monitoring to ensure ongoing stability.
-
postmortem Mar 16, 2026, 07:06 AM UTC
## Customer Impact Customers in the **Canada region** experienced intermittent slowness and timeouts when accessing UiPath Automation Cloud services—including Orchestrator, Action Center, and asset retrieval—over a period of approximately three days. UiPath was in the process of rolling out new gateway infrastructure across regions in a phased manner. The Canada region was one of the regions receiving this rollout. Because the rollout was partial—not all regions were migrated simultaneously—customers whose traffic was routed through the Canada Central gateway were affected. Services in other regions continued to operate normally. * **Impact start**: ~March 9, 2026 16:00 UTC * **Most severe window**: March 11, 08:00–12:00 UTC—request latency at the 99th percentile reached ~5 seconds \(versus a normal baseline of ~300ms\), with over 100,000 requests exceeding five seconds in a 4-hour window * **Impact end**: March 12, 17:45 UTC * **Total duration**: ~73 hours \(intermittent\), with the most acute customer-facing degradation on March 11 During this period, affected customers experienced slow page loads \(particularly Action Center and Orchestrator\), asset retrieval timeouts, and sporadic HTTP 503 errors. The error rate remained low \(peaking at 0.2% of requests\), but latency degradation caused client-side timeouts that manifested as failures for end users. ## Root Cause Customer traffic for the Canada region passes through a regional gateway that directs each request to the correct backend service. The phased rollout began in late February, with 5% of traffic routed through the new infrastructure. Fifty percent of traffic was reached on March 5, then on March 9, all traffic was switched over to the new gateway. While monitoring the gateway during the ramp-up period, no degradation or failures were observed. When all the traffic started hitting the gateway, latency was introduced but not observed. The increase in latency caused some configured thresholds to be exceeded, compounding the issue by queuing requests and triggering restarts of routing services. Routing components were restarted with fresh, empty caches. Normally, these caches allow requests to be resolved in milliseconds. Without them, each request had to fetch routing data from slower backend sources—a process that, under the additional load of the rollout, took longer than the gateway's 5-second timeout. This created a repeating cycle: new instances started up, couldn't respond fast enough, were flagged as unhealthy and restarted — only to start again with empty caches. The cycle was made worse by the Canada Central gateway running at its minimum allocated capacity, leaving insufficient room for old and new instances to run side by side during the rollout. The net effect was that a subset of customer requests experienced delays of up to 15 seconds \(due to automatic retries\), and a small number failed outright with errors. ## Detection This issue was detected via **customer reports** on March 12 at 16:54 UTC \(8:54 am PST\). The ~3-day gap between impact start \(March 9\) and detection \(March 12\) was due to: * **The issue was latency, not outright failure.** Most requests eventually succeeded after retries, so error-rate-based monitors did not trigger. * **Synthetic monitoring for the Canada region did not detect the degradation.** The synthetic checks had timeouts long enough to absorb the 15-second delays, so they continued to report as passing. ## Response Our engineering team immediately began working on this issue upon detection through customer reports on March 12. Within 51 minutes, we resolved the incident by switching traffic back to the previous infrastructure. Services returned to normal operation immediately, and we confirmed that customers could access all services without issues. ## Follow-Up 1. **Add a cache connectivity check to the routing service startup.** New instances will verify cache readiness before accepting traffic, preventing the cold-cache cycle from occurring. 2. **Increase the minimum compute capacity** for the affected gateway cluster to ensure sufficient headroom during infrastructure rollouts. 3. **Add latency-based synthetic monitoring for the Canada region.** Current checks detect total failures—a latency threshold alert would have caught this within the first hour. 4. **Audit all regional gateway clusters** for the same capacity risk to prevent recurrence in other regions. 5. **Add anomaly-detection-based alerts for service restarts, replica set creation.** Although restarts are expected during normal operation, multiple such events in a short timeframe are anomalous and should be immediately investigated.
Read the full incident report →
- Detected by Pingoru
- Mar 11, 2026, 01:41 AM UTC
- Resolved
- Mar 11, 2026, 06:44 AM UTC
- Duration
- 5h 2m
Affected: Agentic OrchestrationAgentic OrchestrationAgentic OrchestrationAgentic OrchestrationAgentic OrchestrationAgentic OrchestrationAgentic OrchestrationAgentic OrchestrationAgentic OrchestrationAgentic Orchestration
Timeline · 8 updates
-
investigating Mar 11, 2026, 01:41 AM UTC
We are investigating reports of an outage impacting the Download Attachment functionality for Maestro across multiple regions. Impact: Users may be unable to download attachments using certain Integration Service connectors. Next update: Our teams are working to understand the cause and scope and will share updates as available.
-
identified Mar 11, 2026, 02:13 AM UTC
We have identified the issue impacting the Download Attachment functionality for Maestro across multiple regions. Impact: Users may be unable to download attachments using certain Integration Service connectors. Our teams are currently working on a hotfix to resolve the issue. Next update: We will share further updates as progress is made.
-
identified Mar 11, 2026, 03:02 AM UTC
We have identified the issue impacting the Download Attachment functionality for Maestro across multiple regions. Impact: Users may be unable to download attachments using certain Integration Service connectors. Our teams are currently working on a hotfix to resolve the issue. Next update: We will share further updates as progress is made.
-
identified Mar 11, 2026, 04:03 AM UTC
We have deployed the fix in the lower environments and verified that it is working as expected. We are currently in the process of deploying the fix to the production environments. We will provide the next update shortly
-
identified Mar 11, 2026, 04:59 AM UTC
The deployment is currently in progress. We will share the next update as soon as more information becomes available
-
monitoring Mar 11, 2026, 05:59 AM UTC
The fix has been rolled out to the following regions: Australia, Canada, India, US, UK, and EU, and has been verified. We are continuing deployment and verification in the remaining regions.
-
monitoring Mar 11, 2026, 06:13 AM UTC
The fix has been rolled out to all regions. We will continue monitoring the incident through our monitoring systems for some time before marking it as resolved.
-
resolved Mar 11, 2026, 06:44 AM UTC
The issue has been resolved. The fix has been fully deployed across all regions, and our monitoring systems confirm that services are operating normally.
Read the full incident report →
- Detected by Pingoru
- Mar 10, 2026, 06:20 PM UTC
- Resolved
- Mar 10, 2026, 04:00 PM UTC
- Duration
- —
Timeline · 2 updates
-
resolved Mar 10, 2026, 06:20 PM UTC
On March 10, 2026, between 16:16 and 16:36 UTC, some customers experienced difficulty accessing Studio Web. During this period, our monitoring systems detected that the service was temporarily unavailable. Our investigation shows that a routine deployment activity occurred around the same time and the service also experienced a short failover event. These combined conditions contributed to a brief interruption in availability. The service recovered automatically within minutes, and access to Studio Web was fully restored without manual intervention. Impact Customers may have experienced temporary inability to access Studio Web during the time window above. Current Status The service is operating normally.
-
postmortem Apr 17, 2026, 08:48 AM UTC
## Customer Impact On March 10, 2026, between 16:16 and 16:36 UTC, a small subset of customers in the Europe region were unable to access Studio Web. During this approximately 20-minute window, attempts to load Studio Web returned errors or routed users to a service-unavailable page. Some customers experienced a longer perceived outage — up to roughly 45 minutes — because the service-unavailable page did not automatically recover once the service was restored. The impact was limited to Studio Web for a small subset of customers in the Europe region. Other UiPath services and regions were not affected. ## Root Cause A routine deployment introduced a security hardening update that changed the internal network port Studio Web listens on, along with how our internal routing layer forwards traffic to Studio Web's service components. During our staged deployment process, the routing update was applied while some service instances were still running the previous configuration. The resulting mismatch caused production traffic to be sent to a pathway that was no longer accepting connections, producing service-unavailable errors. A compounding factor was infrastructure capacity pressure in the shared production environment at the time of the deployment, which delayed the provisioning of replacement service instances and extended the recovery window. ## Detection The incident was detected within approximately 3 minutes of onset through our automated monitoring, which continuously checks Studio Web availability from multiple geographic locations. A subsequent alert confirmed that an automatic failover to our secondary environment had been triggered. The on-call engineering team engaged within 5 minutes of the first alert and correlated the disruption with the in-progress deployment. ## Response Upon detection, the on-call team verified that the deployment was progressing through its staged rollout and determined that it would self-recover as new service instances came online. No manual rollback was required. The service recovered automatically within minutes, and access to Studio Web was fully restored without manual intervention by 16:36 UTC. Customers who had been routed to the service-unavailable page were advised to reload Studio Web to regain access. ## Follow-Up To prevent similar incidents and reduce customer-facing impact from comparable conditions, we are implementing the following improvements: 1. We have updated our deployment configuration so that internal service routing changes are coordinated with service instance updates, preventing the specific mismatch that caused this incident. 2. We have tuned our deployment infrastructure chart to avoid service instance starvation when the shared cluster is under memory pressure, so that rollouts proceed reliably even under capacity constraints. 3. We are adding auto-recovery to the service-unavailable page so that when Studio Web is restored, the page redirects customers back to the application automatically rather than requiring a manual reload. 4. We have investigated and addressed a limitation in our automatic failover that caused only a small fraction of production traffic to route to the secondary environment during the outage. Failover is now configured to fully mitigate similar primary-environment disruptions in the future.
Read the full incident report →
- Detected by Pingoru
- Mar 10, 2026, 06:01 PM UTC
- Resolved
- Mar 11, 2026, 08:10 AM UTC
- Duration
- 14h 8m
Affected: AgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgentsAgents
Timeline · 5 updates
-
investigating Mar 10, 2026, 06:01 PM UTC
We are investigating reports of an outage impacting the Web Search tool used in RPA workflows and Agents in Multiple Regions. Impact: Users may be unable to run RPA workflows or deployed/debug Agents that rely on the Web Search tool, which may cause those automations to fail. Next update: Our teams are working to understand the cause and scope and will share updates as available.
-
identified Mar 10, 2026, 07:49 PM UTC
We have identified the cause of the outage impacting the Web Search tool used in RPA workflows and Agents in Multiple Regions and are working on a fix. Impact: Users may continue to be unable to run RPA workflows or deployed/debug Agents that rely on the Web Search tool, which may cause those automations to fail. Next update: Our focus is on restoring service as quickly as possible.
-
monitoring Mar 11, 2026, 07:31 AM UTC
The fix has been deployed and services have been restored. We will continue to monitor system stability before marking the incident as resolved.
-
resolved Mar 11, 2026, 08:10 AM UTC
The issue has been resolved. Our monitoring indicates that services are operating normally, and no further impact has been observed.
-
postmortem Mar 31, 2026, 02:28 PM UTC
**Customer Impact** Between 2026-03-10 13:54 UTC and 2026-03-11 08:00 UTC, the web Search tool experienced a service disruption. During this window, automated Agents and RPA workflows requiring real-time web search functionality encountered execution failures. **Root Cause** The disruption was caused by an upstream API quota exhaustion triggered by an atypical spike in request volume. The specific legacy endpoint used for this service maintains a fixed, inelastic daily throughput limit. Due to the provider’s deprecation schedule, standard scalability options are currently restricted. This forced a shift from automated infrastructure adjustments to manual traffic management to maintain service stability during the transition period. **Resolution** To restore service and prevent a recurrence, our engineering team deployed a multi-tenant rate-limiting policy. This ensures that high-volume requests from a single source cannot impact global service availability. Service was fully restored once the upstream provider’s daily telemetry window was reset. **Follow-Up Actions** * **Service Migration:** Since the current provider is deprecating this API, we are accelerating the transition to a more scalable, high-throughput web search alternative to ensure elastic capacity. * **Granular Governance:** Permanent implementation of per-organization request caps to ensure fair resource distribution and prevent individual workflow spikes from impacting the broader user base.
Read the full incident report →