UiPath incident

Multiple Regions - Document Understanding - Elevated Error Rates

Major Resolved View vendor source →

Started: Apr 16, 2026, 06:32 PM UTC
Resolved: Apr 16, 2026, 07:39 PM UTC
Duration: 1h 6m
Detected by Pingoru: Apr 16, 2026, 06:32 PM UTC

Affected components

Document UnderstandingDocument Understanding

Update timeline

investigating Apr 16, 2026, 06:32 PM UTC

We are currently investigating an increase in failed requests affecting Document Understanding (DU) generative extraction capabilities in both the US and EU regions. Our team has identified the issue and is actively working on mitigation. During this time, users may experience higher error rates or incomplete responses when using generative extraction features. We will provide updates as we make progress toward full resolution.
monitoring Apr 16, 2026, 07:23 PM UTC

We have observed that error rates affecting Document Understanding (DU) generative extraction in both the US and EU regions have subsided. The system is currently stable, and our team continues to monitor performance closely to ensure full reliability. We will share further updates if there are any changes.
resolved Apr 16, 2026, 07:39 PM UTC

The issue affecting Document Understanding (DU) generative extraction in both the US and EU regions has been fully resolved. All systems are operating normally. We will continue to monitor to ensure ongoing stability.
postmortem Apr 23, 2026, 06:36 PM UTC

## Customer Impact Between April 16, 2026 at 17:31 UTC and April 16, 2026 at 18:32 UTC, a subset of customers in the US and EU regions experienced elevated error rates and incomplete responses when using the Document Understanding service's generative extraction features. During this period, affected users encountered server errors \(HTTP 500\) when submitting documents for automated data extraction, resulting in approximately 650 failed requests across both regions, with the US region accounting for the majority. The period of elevated errors lasted approximately one hour before subsiding. Some affected organizations experienced error rates exceeding 70% at the peak of the incident, while others saw lower but still disruptive failure rates. ## Root Cause The incident was caused by an unexpected, temporary change in the response format from our external AI model provider's embeddings related endpoint. This change broke the established interface contract between the provider and our Document Understanding service, preventing the service from correctly parsing embedding responses. Specifically, our service expected a structured response object containing a required attribute, but the provider returned an empty response \(HTTP 200 with an empty body\). This caused a parsing error when our service attempted to process the response body. Importantly, the provider's endpoint continued to return a successful HTTP status code, meaning the failure was not detectable at the network level and only surfaced during response processing within our service. The issue affected multiple unrelated organizations simultaneously across both regions and lasted approximately one hour, consistent with a temporary provider-side deployment or configuration change. ## Detection The incident was detected on April 16, 2026 at 17:44 UTC, when automated monitoring systems triggered alerts for increased counts of failed requests in the Document Understanding service's generative extraction features across both the US and EU regions. Analysis of service logs and error metrics revealed a sharp rise in server errors, with multiple organizations across both regions experiencing failures. The time between the onset of elevated error rates and detection was short, owing to real-time alerting and continuous monitoring of service health. ## Response Upon detection, the engineering team immediately convened via a dedicated call to investigate the root cause. A public status page was published at approximately 18:28 UTC informing customers that the team was investigating elevated error rates and incomplete responses in the generative extraction service across the US and EU regions. Initial efforts focused on isolating the affected regions and identifying patterns in failed requests. The team examined specific failed extraction requests, retrieved error artifacts for analysis, and compared them against successful reference cases to characterize the failure mode. Our engineering team ruled out all internal causes early in the investigation. No deployments or configuration changes had been made to our systems for approximately one week prior to the incident. The team confirmed through systematic log analysis that the failures were isolated to the embedding model endpoint, corroborating the external provider as the source of the disruption. A support ticket was filed with the provider the following day to obtain a detailed explanation of the change. Error rates began subsiding approximately one hour after onset. By 18:42 UTC, the team confirmed that the incident was no longer active, though investigation into the underlying cause continued. The public status page was updated at 19:19 UTC to reflect that error rates had returned to normal and the system was stable. The incident was declared fully resolved at 19:35 UTC, with all affected services operating normally. The status page was updated with a final resolution notice, and ongoing monitoring confirmed sustained stability. ## Follow-Up To reduce the risk of similar incidents and minimize customer impact from external dependency failures, we are implementing the following targeted improvements: 1. **Fallback and redundancy strategies:** We are evaluating and, where feasible, implementing fallback mechanisms to alternative models or providers when a contract break or provider outage is detected, reducing the duration and severity of customer impact. 2. **Improved diagnostic logging:** We are updating our systems to capture and retain raw provider responses for failed requests. This will enable faster forensic analysis during future incidents and eliminate the diagnostic gap identified during this investigation. 3. **Expanded monitoring**: We are updating our suite of alerts to cover this scenario. This will significantly reduce detection time for similar incidents. These measures build on lessons learned from this and similar past events. We are committed to delivering reliable, resilient automation services and will continue to strengthen our systems to ensure that customers experience minimal disruption, even when external dependencies change unexpectedly.

Looking to track UiPath downtime and outages?

Pingoru polls UiPath's status page every 5 minutes and alerts you the moment it reports an issue — before your customers do.

Real-time alerts when UiPath reports an incident
Email, Slack, Discord, Microsoft Teams, and webhook notifications
Track UiPath alongside 5,000+ providers in one dashboard
Component-level filtering
Notification groups + maintenance calendar

Start monitoring UiPath for free

5 free monitors · No credit card required